feat: add Iteration 3e for memory security and integrity (OWASP ASI06)#5
Merged
Conversation
Address OWASP ASI06 (Memory & Context Poisoning) with a 4-phase implementation plan based on deep research (33 sources). ROADMAP.md: - Add Iteration 3e between 3d and 4 with 4 phases: Phase 1 (input hardening), Phase 2 (trust-aware retrieval), Phase 3 (detection and response), Phase 4 (advanced protections) - Update summary section with 3e entry MEMORY.md: - Add Memory Security Analysis section with full threat taxonomy (intentional and emergent corruption vectors) - Document 9 identified gaps with severity ratings - Define 6-layer defense architecture - Catalog existing partial mitigations - Add academic and industry references SECURITY.md: - Add OWASP ASI06 classification and context - Expand attack vectors beyond PR review comments (MINJA, GitHub issue injection, experience grafting, RAG poisoning, emergent self-corruption) - Document 6-layer defense architecture requirements - Update Known Limitations with memory security gaps
krokoko
approved these changes
Apr 2, 2026
Contributor
Author
|
CI Note: The |
2 tasks
13 tasks
scoropeza
pushed a commit
to scoropeza/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 5, 2026
…rokoko review aws-samples#1, aws-samples#5, aws-samples#9, aws-samples#12) Four related fixes on the fanout + github-comment surface, from the code review on PR aws-samples#52. Grouped because they share the narrative "defense-in-depth on the fanout dispatcher" — any one landing without the others leaves a hole. ## Findings addressed **aws-samples#1 — Fanout handler returns void despite reportBatchItemFailures: true** The ``FanOutConsumer`` construct (``cdk/src/constructs/fanout-consumer.ts:146``) has ``reportBatchItemFailures: true`` on its DDB Stream event-source mapping. The handler returned ``void``, so Lambda retried the entire batch on any unhandled throw instead of isolating the poisonous record. Combined with aws-samples#5 this could cascade into retry storms and violated the per-task ordering guarantee we rely on (§6.4, AD-9). Fix: handler return type becomes ``Promise<DynamoDBBatchResponse>``. Per-record processing is wrapped in try/catch; caught throws push ``{ itemIdentifier: record.eventID }`` to ``batchItemFailures`` and emit ``fanout.record.failed`` warn. Final ``fanout.batch.complete`` log grows a ``failed`` count. Note: ``DynamoDBStreamHandler`` constrains return to ``void | Promise<void>``, so the handler is typed as a plain 3-arg async function. Lambda's runtime accepts either shape; existing tests (passing ``event, context, cb``) work unchanged. **aws-samples#5 — Unhandled exception in routeEvent crashes batch** ``routeEvent`` uses ``Promise.allSettled`` internally, but ``resolveTokenSecretArn`` can throw ``AccessDeniedException`` SYNCHRONOUSLY before the ``allSettled`` guard is reached. The new per-record try/catch from aws-samples#1 catches these too. **aws-samples#9 — renderCommentBody not self-defending against uncoerced DDB strings** The ``.toFixed(4)`` call on ``costUsd`` is the same bug class as the ``toFixed is not a function`` crash we fixed at the fanout boundary in commit 9fe704e. Today the sole call site coerces via the shared helper; a future caller that forgets to would crash. Fix: ``renderCommentBody`` coerces ``durationS`` and ``costUsd`` internally via the shared ``coerceNumericOrNull`` helper (second line of defense; caller's coercion remains the first). Widened ``CommentBodyInput`` fields to ``number | string | null`` to honestly model the DDB Document-client boundary. **aws-samples#12 — Markdown injection possible via prUrl in GitHub comment body** ``prUrl`` was interpolated directly into a Markdown link target (``[link](${input.prUrl})``). A crafted URL containing ``)`` / ``|`` / ``\n`` could break the table layout or inject content, and a ``javascript:`` scheme could produce a click-to-execute link on some Markdown renderers. Fix: new exported ``sanitizeMarkdownLinkTarget`` helper in ``shared/github-comment.ts`` rejects URLs containing ``\r\n\t\s)|]"<>`` characters, validates via ``new URL()``, and rejects non-http(s) schemes. Returns ``null`` on rejection so ``renderCommentBody`` omits the Pull-request row entirely rather than emitting a broken or unsafe link. ## Tests +22 regression tests net (fanout 7 for aws-samples#1+aws-samples#5 + 3 for aws-samples#9; github-comment 12 for aws-samples#12): - Fanout partial-batch: poison-pill isolation, mixed-batch (good record NOT in failures), observability warn, empty-failures regression guard, baseline pin that today's ``Promise.allSettled`` containment still works. - renderCommentBody numeric self-defense: string-typed values render correctly; non-finite strings collapse to null with warn; null does NOT warn. - sanitizeMarkdownLinkTarget unit tests: accept clean http/https, reject 9 injection patterns, reject 4 non-http schemes (``javascript:``, ``data:``, ``file:``, ``ftp:``), reject malformed, handle null/undefined. Plus end-to-end assertions on ``renderCommentBody`` proving the PR row is omitted on rejection. CDK suite: 1029 passing (was 1001). Refs: krokoko code review on PR aws-samples#52 (findings 1, 5, 9, 12)
isadeks
pushed a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 13, 2026
…gs (aws-samples#79 review aws-samples#5) The reaction / delete helpers (``addReaction``, ``removeReaction``, ``deleteMessage``) used to log every catch at warn with a single generic event key, lumping API-level rejections (e.g. ``no_reaction``) together with infrastructure failures (DNS lookup, TLS handshake, fetch timeout, JSON parse error from a hostile gateway). Operators who alarmed on the warn rate saw a flat signal that masked genuine infra problems. Split the boundary: - API-level (``!result.ok`` after a successful HTTP call) stays at warn with channel-specific event keys (``fanout.slack.reaction_add_api_error``, ``fanout.slack.reaction_remove_api_error``, ``fanout.slack.message_delete_api_error``). These are per-message UX problems; operators don't page. - Network errors (the outer ``catch (err)`` after ``fetch``) promote to ``logger.error`` with dedicated event keys (``fanout.slack.reaction_add_network_error``, ``fanout.slack.reaction_remove_network_error``, ``fanout.slack.message_delete_network_error``) and ``error_id``s (``FANOUT_SLACK_REACTION_NETWORK``, ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable signal. User-visible symptoms when these fire silently: stale emoji reactions (hourglass never swaps to ✅) and orphaned intermediate messages. Behaviour unchanged: errors are still swallowed (per-message reactions and intermediate cleanup are best-effort by design; they must not fail the batch), but operators now get distinct metrics for each failure class.
8 tasks
github-merge-queue Bot
pushed a commit
that referenced
this pull request
May 13, 2026
#79) * feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64) Move the Slack outbound delivery off its own DynamoDB Streams consumer onto FanOutConsumer as a per-channel dispatcher. Drops TaskEventsTable from 2 concurrent stream readers to 1, restoring headroom for future channels (Email, Teams, etc.) without exceeding the documented DynamoDB Streams 2-reader-per-shard practical limit. The PR also addresses an adversarial code review on the original migration; the body below walks through each piece in the order it landed. ## (a) Migration - `cdk/src/handlers/slack-notify.ts` — rewritten as exported `dispatchSlackEvent(event, ddb)` plus a tagged `SlackApiError` class. The standalone `handler(event)` stream entrypoint is gone; the FanOutConsumer is now the only TaskEventsTable stream reader. Behaviour preserved bit-for-bit: channel_source==='slack' gate, terminal-event dedup via conditional UpdateItem on `channel_metadata.slack_notified_terminal`, threaded replies under the @mention or task_created message, emoji transitions (eyes -> hourglass -> ✅/❌/🚫/⏲), DM channel_id -> user_id rewrite, intermediate session+created message cleanup on terminal events. - `cdk/src/handlers/fanout-task-events.ts` — replaces the log-only `dispatchToSlack` stub with a wrapper that calls dispatchSlackEvent and routes errors via the new typed contract (see (b) below). Slack defaults gain task_created, session_started, task_timed_out so the router fans out the lifecycle events the old SlackNotifyFn handled; the dispatcher's channel_source gate keeps non-Slack tasks unaffected. - `cdk/src/constructs/fanout-consumer.ts` — adds a scoped `secretsmanager:GetSecretValue` grant on `bgagent/slack/*` so the fanout Lambda can fetch per-workspace bot tokens. Same scope the old SlackNotifyFn role held. - `cdk/src/constructs/slack-integration.ts` — deletes SlackNotifyFn, its DynamoEventSource, its IAM policy, and its NagSuppressions entry. Drops the now-unused StartingPosition / FilterCriteria / FilterRule / lambdaEventSources imports. After this lands, `aws lambda list-event-source-mappings` shows exactly one consumer of the TaskEventsTable stream (FanOutFn); verified on the dev stack with end-to-end @mention + cancel + CLI isolation scenarios. ## (b) Review fix #1 — partial-batch retry semantics (BLOCKER) The first review pass found that the post-migration handler silently dropped Slack-side infra errors (DDB throttle on the GetItem, Secrets Manager 5xx, transient Slack timeout). Pre-migration the SlackNotifyFn handler rethrew non-SlackApiError so Lambda retried the batch; post-migration `Promise.allSettled` swallowed the rejection and routeEvent returned an empty list with no escalation path to `batchItemFailures`. routeEvent's return type changed from `NotificationChannel[]` to `{ dispatched, infraRejections }`. The handler now pushes the record into `batchItemFailures` whenever `infraRejections.length>0`, so Lambda replays the record under the partial-batch contract. The warn line on rejection is tagged `retryable: true` so operators can alert distinctly from the channel-terminal swallow path. GitHub got the symmetric treatment: 4xx (excluding the existing 401 and 404 handling) is now treated as a channel-terminal swallow via `fanout.github.api_error` instead of escalating to retry. ## (c) Review fix #2 — split SlackApiError into terminal + retryable Originally any `!result.ok` Slack response was wrapped in SlackApiError and swallowed. That collapsed retryable codes (`ratelimited`, `service_unavailable`, `internal_error`, `fatal_error`, `request_timeout`) into the same swallow as `channel_not_found` — a tier-1 Slack outage would silently drop every message. Introduced `TERMINAL_SLACK_API_ERRORS` set + `classifySlackError` helper. Terminal codes still throw SlackApiError (router swallows). Retryable codes throw a plain Error so the router classifies them as infra rejections and Lambda replays. ## (d) Review fix #3 — NOTIFIABLE_EVENTS / CHANNEL_DEFAULTS drift The original migration added task_created/session_started/task_timed_out to CHANNEL_DEFAULTS.slack but the dispatcher's NOTIFIABLE_EVENTS gate already excluded several events the router was subscribing Slack to (agent_error, pr_created, task_stranded). Result: Slack was reported as `dispatched` for events it silently dropped — telemetry lied, agent_error never reached operators on Slack-origin tasks, and task_stranded rendered the generic "Event: task_stranded for owner/repo" fallback (UX regression). Added render cases for task_stranded and agent_error in slack-blocks.ts and added them to NOTIFIABLE_EVENTS. Forward-compat approval_required and status_response stay out of NOTIFIABLE_EVENTS until their emitters ship; a new cross-file consistency test in fanout-task-events.test.ts fails if anyone re-introduces the drift. The Slack dispatcher wrapper now passes `effectiveEventType` so an agent_milestone(pr_created) wrapper is unwrapped before NOTIFIABLE_EVENTS matching. Without the rewrite, the dispatcher would short-circuit on the wrapper string `agent_milestone`. ## (e) Review fix #4 — conditional UpdateItem on lifecycle persists Once the BLOCKER fix made batches retry, the original task_created and session_started UpdateItem calls became hazardous: a Slack POST that succeeded but whose follow-up UpdateItem failed transiently would, on retry, post a second root and overwrite slack_thread_ts — orphaning every threaded reply that had threaded under the first ts. Both UpdateItems now carry an `attribute_not_exists` ConditionExpression on the relevant `channel_metadata.slack_*_msg_ts`. On ConditionalCheckFailedException the handler logs at info, deletes the duplicate Slack message via `chat.delete`, and returns. Sibling retry wins the race; the duplicate is cleaned up. ## (f) Dev-stack regression: drop pr_created from Slack defaults Live verification surfaced a UX duplication: pr_created (subscribed in CHANNEL_DEFAULTS.slack as the original §6.2 design called for) and task_completed both rendered messages with View PR buttons, posted seconds apart. The original SlackNotifyFn had silently dropped pr_created (NOTIFIABLE_EVENTS gate), so users hadn't relied on it. Removed pr_created from CHANNEL_DEFAULTS.slack and from NOTIFIABLE_EVENTS, and removed the prCreatedMessage renderer. GitHub keeps pr_created (its edit-in-place comment surface genuinely benefits from the early checkpoint). ## Verification - mise //cdk:compile — clean - mise //cdk:test — 1183 / 1183 pass (8 net-new tests added for the review fixes: NOTIFIABLE_EVENTS drift guard, retryable Slack codes, GitHub 4xx swallow, infra rejection escalation, SlackApiError swallow, task_stranded render) - mise //cdk:eslint — clean - mise //cdk:synth — confirms exactly one Lambda::EventSourceMapping on TaskEventsTable, pointing at FanOutFn - Dev-stack scenarios — @mention happy path, Cancel button, CLI submit (channel_source=api -> zero Slack dispatches, GitHub edit-in-place still fires) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(fanout): retry GitHub 403/429 instead of swallowing as terminal (#79 review #1) PR #79 review found that the new 4xx terminal-swallow path treats HTTP 403 and 429 as channel-terminal — but on GitHub these are transient rate-limit responses (403 with "API rate limit exceeded", 429 "Too Many Requests"). Under a reconciliation wave that touches many tasks, an entire window of GitHub comment updates would be permanently dropped with only a warn log. Carve out 403 and 429 from the swallow guard so they propagate as infra rejections through ``Promise.allSettled``. The record lands in ``batchItemFailures`` and Lambda replays until the rate-limit window clears (or DLQs after ``retryAttempts``). Test coverage: parametrized over 403 + 429 with a GitHubCommentError mock at the helper boundary, asserting the record's eventID surfaces in ``batchItemFailures`` rather than being absorbed. * fix(fanout): guard Slack Secrets Manager grant on a prop (#79 review #2) Every other external-service grant in FanOutConsumer (taskTable, repoTable, githubTokenSecret) is gated by ``if (props.X)``, so a deployment that hasn't onboarded the corresponding service stays free of dangling IAM permissions. The original migration broke the pattern with an unconditional ``bgagent/slack/*`` Secrets Manager grant — dev stacks without Slack onboarding ended up holding read permission on a resource pattern they never use, with a misleading ``cdk-nag AwsSolutions-IAM5`` suppression reason. Adds an optional ``slackSecretArnPattern`` prop on ``FanOutConsumerProps``; the policy statement is only attached when the prop is set. ``cdk/src/stacks/agent.ts`` now computes the ``bgagent/slack/*`` ARN inline and passes it through, mirroring the other guarded props. ``ArnFormat`` and ``Stack`` imports moved out of fanout-consumer.ts since the construct no longer needs them. No changes to live behaviour — agent.ts always passes the prop, so the IAM policy still attaches in production. The dispatcher will log-and-fail-retry on a missing pattern (covered by review #3 fix). Test gap covering the construct itself ships in a follow-up commit (test gap #34). * fix(fanout): throw on missing TASK_TABLE_NAME env var (#79 review #3) Pre-fix: when ``TASK_TABLE_NAME`` was unset on a Slack-subscribed event, ``dispatchSlackEvent`` returned silently after a warn line. The router counted Slack as ``dispatched`` and a broken stack quietly dropped every Slack notification — operators only saw it in the warn-rate metric, with no rejected-channel signal. Post-fix: throw a plain Error so the rejection propagates as an infra rejection through ``Promise.allSettled``. The router pushes the record into ``batchItemFailures``, Lambda retries the batch, the ``fanout.dispatcher.rejected`` warn fires per record, and operators get a distinct alarm. Also bumps the existing log line from ``warn`` to ``error`` and attaches an ``error_id: FANOUT_SLACK_MISSING_TASK_TABLE`` so the deployment-bug case can be distinguished from per-record failures. Test: ``throws when TASK_TABLE_NAME env var is missing`` deletes the env var, asserts the throw, asserts no DDB call was attempted (env-var guard fires first). * fix(fanout): match SlackApiError by name as well as instanceof (#79 review #7) When a bundler ever duplicates the slack-notify module (rare with NodejsFunction tree-shaking but possible if dual-bundled), two distinct SlackApiError classes coexist and ``instanceof`` against one fails for instances of the other. The dispatcher would see a foreign-class SlackApiError, fall through to the rethrow branch, and the router would treat it as an infra rejection — flipping a channel-terminal swallow into infinite Lambda retries. Add an ``err.name === 'SlackApiError'`` fallback so the swallow branch fires either way. Mirrors the duck-typed ``GitHubCommentError`` check used elsewhere in the same handler. Test: synthesise a plain Error with name === 'SlackApiError' (NOT an instance of the mock's SlackApiError class) and assert batchItemFailures stays empty — proving the swallow path catches both shapes. * fix(fanout): extend TERMINAL_SLACK_API_ERRORS with permission codes (#79 review #8) Original set omitted documented Slack permission/scope failures. Codes outside the set fall to the retryable branch, so a misconfiguration like ``ekm_access_denied`` or ``missing_scope`` would burn 3 Lambda retries before DLQ on every event — even though the failure is fundamentally a configuration bug that no retry can clear. Adds: - Permission/scope: missing_scope, ekm_access_denied, team_access_not_granted, posting_to_general_channel_denied - Payload shape: invalid_arguments Reorganized the set into commented blocks (channel-shape, auth, permission/scope, payload-shape) so future additions go in the right bucket and the rationale stays visible. Test coverage: parametrized over the full TERMINAL_SLACK_API_ERRORS set (21 codes) — every one must throw SlackApiError so the router swallows it. The existing retryable test.each remains intact and covers the negative-class case (codes outside the set throw a plain Error and escalate to retry). * fix(fanout): promote Slack reaction/delete network errors to error logs (#79 review #5) The reaction / delete helpers (``addReaction``, ``removeReaction``, ``deleteMessage``) used to log every catch at warn with a single generic event key, lumping API-level rejections (e.g. ``no_reaction``) together with infrastructure failures (DNS lookup, TLS handshake, fetch timeout, JSON parse error from a hostile gateway). Operators who alarmed on the warn rate saw a flat signal that masked genuine infra problems. Split the boundary: - API-level (``!result.ok`` after a successful HTTP call) stays at warn with channel-specific event keys (``fanout.slack.reaction_add_api_error``, ``fanout.slack.reaction_remove_api_error``, ``fanout.slack.message_delete_api_error``). These are per-message UX problems; operators don't page. - Network errors (the outer ``catch (err)`` after ``fetch``) promote to ``logger.error`` with dedicated event keys (``fanout.slack.reaction_add_network_error``, ``fanout.slack.reaction_remove_network_error``, ``fanout.slack.message_delete_network_error``) and ``error_id``s (``FANOUT_SLACK_REACTION_NETWORK``, ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable signal. User-visible symptoms when these fire silently: stale emoji reactions (hourglass never swaps to ✅) and orphaned intermediate messages. Behaviour unchanged: errors are still swallowed (per-message reactions and intermediate cleanup are best-effort by design; they must not fail the batch), but operators now get distinct metrics for each failure class. * fix(fanout): emit fanout.slack.dup_delete_failed on ghost-message accumulation (#79 review #6) The conditional UpdateItem dup-delete path (``task_created`` / ``session_started`` lifecycle persists) calls ``deleteMessage`` to clean up the duplicate Slack message that landed when a sibling retry won the race. The delete is inherently best-effort — but if it fails, the duplicate becomes a permanent ghost in the thread and operators had no way to alarm on the rate. Refactor ``deleteMessage`` to return a boolean (``true`` on success or ``message_not_found``-as-already-gone, ``false`` otherwise) and emit a dedicated ``fanout.slack.dup_delete_failed`` event with an ``error_id: FANOUT_SLACK_DUP_DELETE_FAILED`` from the dup-delete callsites when the cleanup couldn't complete. The terminal-event cleanup paths (``slack_session_msg_ts``, ``slack_created_msg_ts``) intentionally don't fire this event — those paths target genuinely-stale UX cleanup, not retry-driven duplicates, so an alarm there would be noise. No new tests beyond the existing dup-delete coverage; the ``deleteMessage`` return value isn't yet asserted at the unit level, but the behavior is fully exercised by the existing ``dup-delete`` integration paths (test gap #31 will add an explicit failure-path assertion when it lands). * chore(fanout): tighten RouteOutcome arrays to ReadonlyArray (#79 review #9) ``RouteOutcome.dispatched`` and ``infraRejections`` were typed as plain ``NotificationChannel[]`` — which made ``readonly`` on the property prevent reassignment but still allow callers to mutate the underlying array via ``.push``, ``.splice``, or ``.sort``. Inconsistent with the ``ReadonlySet<string>`` used for ``CHANNEL_DEFAULTS`` in the same file. Tightening to ``ReadonlyArray<NotificationChannel>`` makes the contract honest: the router owns the arrays, callers read them. Test suite updated to use ``[...outcome.dispatched].sort()`` where it previously called ``.sort()`` directly — the explicit copy makes the intent clear and would have surfaced any silent test-side mutation. * refactor(fanout): make SlackDispatchEvent a type alias of FanOutEvent (#79 review #10) The two interfaces were structurally identical: same five fields, same readonly modifiers, same metadata shape. The decoupling was purely nominal and a silent-drift footgun — adding a field to ``FanOutEvent`` (e.g. when the router starts plumbing an ``approval_required`` ID through) would not flow into ``SlackDispatchEvent``, leaving the dispatcher unaware until a downstream test happened to fail. Replace with a one-line type alias: export type SlackDispatchEvent = FanOutEvent; The slack-notify module now type-imports ``FanOutEvent`` from fanout-task-events. ``import type`` is erased at compile time, so the runtime bundle still has the one-way dep (fanout-task-events → slack-notify) — no module-cycle hazard. Reviewer-suggested ``Pick<FanOutEvent, 'task_id' | …>`` was considered and rejected: the dispatcher uses every field of ``FanOutEvent``, so the Pick would just enumerate the same five fields with extra noise. A direct alias keeps the intent obvious and prevents drift identically. * fix(fanout): generalize Slack dedup to cover agent_error + log Retry-After (#79 review #4) PR #79 review #4 surfaced a sibling-channel-failure hazard: when GitHub or Email rate-limits, the record lands in ``batchItemFailures``. On the Lambda retry, every Slack-subscribed event for that record runs again. Terminal events were already guarded by ``slack_notified_terminal``; ``agent_error`` was not — operators would page twice on a single agent failure if a sibling channel happened to fail. Generalize the dedup mechanism. ``TERMINAL_EVENTS`` is replaced by a ``SLACK_DEDUP_ATTRIBUTE`` map that marks each event type with the ``channel_metadata`` attribute that should guard the post: - 5 terminals share ``slack_notified_terminal`` (any first-arriving terminal claims the right; subsequent terminals dedup against it) - ``agent_error`` gets its own ``slack_dispatched_agent_error`` so a duplicate agent_error doesn't reuse the terminal slot - ``task_created`` / ``session_started`` map to ``null`` because they already use the per-event ``slack_*_msg_ts`` conditional persists from review #1 — the conditional already provides full idempotency (a separate marker would be redundant) Also surfaces Slack's ``Retry-After`` header on rate-limited responses through a dedicated ``fanout.slack.retryable_api_error`` warn so operators reading CloudWatch can see the recovery window instead of guessing from sustained warn rate. Tests: - logs Retry-After header on rate-limited Slack responses (new): asserts ``retry_after_seconds`` propagates from Slack's response header into the warn metadata - existing terminal-codes parametrized test untouched (terminal branch doesn't read headers) - existing retryable test gains a ``headers: { get: () => null }`` stub on the fetch mock so the headers.get call doesn't crash Reviewer suggested a per-channel dispatch bitmap as the alternative. Rejected as premature: the duplicate-GitHub-PATCH is harmless (idempotent), Email is still a stub, and the dedup map covers the specific agent_error pain identified above. A bitmap would add a new table + IAM grants + per-dispatch DDB cost for a hypothetical problem (Slack rate-limiting AND a sibling channel failure). * test(fanout): conditional UpdateItem race + dup-delete coverage (#79 test gap) Adds 4 tests covering the lifecycle-persist conditional path that review fix #1 introduced and review fix #6 hardened. Pre-PR-#79 the only ConditionalCheckFailed coverage was the terminal-dedup path; the new lifecycle-persist + dup-delete code lacked direct assertions and was flagged 9/10 criticality by the reviewer. - task_created persist ConditionalCheckFailed → posts duplicate then deletes it: pins the cleanup behaviour that prevents ghost task_created posts in the channel - session_started persist ConditionalCheckFailed → posts duplicate then deletes it: parallel coverage for the other lifecycle attribute (slack_session_msg_ts) - dup-delete failure emits fanout.slack.dup_delete_failed with error_id: pins the operator-alarm signal added in review fix #6; asserts both the event key and the FANOUT_SLACK_DUP_DELETE_FAILED error_id propagate - chat.delete returning message_not_found is treated as success (no dup_delete_failed): negative-class assertion. Prevents false-positive alarms when the race resolves cleanly (the duplicate was already deleted by a prior retry). The ghost / message_not_found tests use ``fetchMock.mockImplementation`` URL-routing rather than ``.mockResolvedValueOnce`` chains because ``updateReaction`` issues 2-3 reaction-API fetches between chat.postMessage and chat.delete; routing by URL keeps the test focused on the load-bearing chat.delete behaviour without coupling to reaction call order. * test(fanout): cover task_stranded + agent_error renderers (#79 test gap #32) Pre-PR-#79 the new ``taskStrandedMessage`` and ``agentErrorMessage`` helpers in slack-blocks.ts had no direct unit tests. Reviewer flagged this as a 7/10 gap because the renderers carry the prior_status / error_type / message_preview metadata threaded through from the event source — silent drift in the metadata field names would produce ugly fallback messages in production. Adds 5 tests: - task_stranded WITH metadata renders the prior_status parenthetical (``Task stranded for org/repo (last status: RUNNING)``) so operators can tell at a glance whether the task hung in HYDRATING vs RUNNING — without the parenthetical the reviewer's "generic Event: ..." UX regression would resurface. - task_stranded WITHOUT metadata still renders cleanly (legacy events written before the reconciler started stamping metadata must not crash or leak ``undefined``). - agent_error with full metadata (error_type + message_preview) renders the rotating_light, type, and preview. - agent_error WITHOUT metadata stays sensible — no leaked ``undefined`` strings or empty ``_Type:_`` line. - agent_error truncates a 500-char message_preview to keep Slack channel UX readable. * test(fanout): cover agent_error dedup + dedup-slot isolation (#79 test gap #33) Pre-PR-#79 review-fix #4 there was no direct test for the ``slack_dispatched_agent_error`` dedup attribute or its interaction with the existing ``slack_notified_terminal`` slot. A future refactor that collapsed the two slots — or renamed one of them — would silently break the sibling-channel-failure-retry guarantee that fix #4 added. Adds 4 tests: - ``agent_error claims its own dedup attribute``: pins the UpdateExpression and ConditionExpression strings so a refactor that renames the attribute breaks loudly. - ``agent_error retry hits the dedup guard``: end-to-end scenario matching review #4 — task already has ``slack_dispatched_agent_error: true``, retry must short-circuit before chat.postMessage. Without the guard, a second rotating_light fires. - ``terminal dedup attribute is per-class``: a flaky task_completed-then-task_failed sequence dedups against the same ``slack_notified_terminal`` slot. Catches the regression where the orchestrator emits both terminal types and we'd otherwise post both ✅ and ❌. - ``agent_error and terminals use distinct dedup slots``: the important negative — having ``slack_dispatched_agent_error`` set must NOT shadow a subsequent ``task_completed``. Pins the slot separation so a future merge into a single slot can't silently drop terminals after an agent_error. * test(fanout): add construct-level tests for FanOutConsumer (#79 test gap #34) The construct shipped on issue #64 with no unit-level coverage of its IAM contract. The only synth-level signal lived inside ``slack-integration.test.ts`` ("0 EventSourceMapping") which proved the migration didn't regress the OTHER construct. Reviewer flagged this 6/10 — and the gap is what allowed review #2 (unconditional Slack secret grant) to slip through in the first place. Adds 6 tests: - ``attaches a single DynamoEventSource on the TaskEventsTable stream``: pins the architectural invariant — issue #64 was fundamentally about reaching exactly-one stream reader. Adding a second consumer must fail this test loudly. - ``creates a DLQ for the fanout Lambda``: pins retention period + presence; a DLQ-less deployment would silently drop poison-pill records past retryAttempts. - ``omits the bgagent/slack/* grant when slackSecretArnPattern is not provided``: the review #2 invariant. Iterates every IAM::Policy and asserts NONE of them grant secretsmanager:* on a bgagent/slack/* ARN. A regression that re-introduces the unconditional grant breaks this test. - ``attaches the bgagent/slack/* grant only when slackSecretArnPattern is provided``: the positive case. Pins the grant shape (action, effect, resource pattern). - ``passes TASK_TABLE_NAME env var when taskTable is provided``: review #3 dependency — the dispatcher throws on missing env. - ``omits TASK_TABLE_NAME env var when taskTable is not provided``: graceful degrade for dev stacks that haven't onboarded the TaskTable yet (matches the construct's documented contract). * test(fanout): cover task_stranded through terminal dedup (#79 test gap #35) The reconciler at handlers/reconcile-stranded-tasks.ts:170 emits BOTH ``task_stranded`` and ``task_failed`` for a heartbeat-expired task — one for the operator signal, one to drive the FAILED status transition. Pre-PR-#79 this pair had no test coverage; reviewer flagged this 8/10 because the visible failure mode (a paired "Task stranded" + "Task failed" double-page in Slack) would surface in production but be silent in CI. Adds 2 tests: - ``task_stranded posts and writes the terminal dedup marker on first arrival``: pins that task_stranded participates in the shared terminal slot and renders the warning message with metadata. Catches a regression that omits task_stranded from the dedup map entirely. - ``task_stranded after a sibling task_failed dedups``: the operational scenario — task_failed already claimed ``slack_notified_terminal``; the subsequent task_stranded must short-circuit before chat.postMessage. Without this guard, operators get the double-page the reviewer warned about. * fix(fanout): re-read TaskRecord before terminal cleanup to close orphan-message race Live observation during PR #79 review verification: the same Slack @mention happy path sometimes leaves the 🚀 task_created message in the thread (orphaned beside the ✅ task_completed) and sometimes deletes it cleanly. The race window: 1. ``task_created`` stream batch posts the rocket message and persists ``slack_created_msg_ts`` via the conditional UpdateItem introduced in PR #79 review fix #1. 2. ``task_completed`` stream batch fires ~30s later. Its initial GetItem races the prior UpdateItem and sees a stale ``channel_metadata`` WITHOUT ``slack_created_msg_ts``. 3. The terminal cleanup branch checks ``channelMeta.slack_created_msg_ts`` — undefined — silently skips the chat.delete. The rocket message stays in the thread. Add a fresh GetItem inside the TERMINAL_EVENTS cleanup branch, after the dedup UpdateItem has linearized our view of the table. Any prior ``slack_*_msg_ts`` writes are visible by then, so the cleanup fires correctly. On a re-read failure (DDB throttle / transient blip) we fall back to the dispatch-entry snapshot and emit ``fanout.slack.cleanup_reread_failed`` so operators can alarm on the rate. Pre-existing race (the unconditional UpdateItem in pre-PR-#79 was the same shape — wrote, GetItem on the next batch could miss it). PR #79 doesn't introduce it but doesn't fix it either; this commit does, since the live screenshot evidence appeared during review verification. Tests: - ``terminal cleanup re-reads TaskRecord``: scripts a stale dispatch-entry GetItem followed by a fresh re-read GetItem with ``slack_created_msg_ts`` present; asserts chat.delete fires against the freshly-read ts. - ``terminal cleanup falls back to dispatch-entry snapshot when re-read fails``: defense-in-depth — DDB throttle on the re-read must not break terminal delivery; cleanup uses the entry snapshot and emits the fallback warn. --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
3 tasks
isadeks
pushed a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 26, 2026
Addresses the blocker + critical items from PR review: - Refresh-token race (review blocker). Linear rotates refresh_tokens on every use; concurrent Lambdas/agents racing the same secret will all read the same expiring token and one's refresh will succeed while the others get `invalid_grant`. On `invalid_grant`, re-read the secret from Secrets Manager (bypassing cache). If the refresh_token has changed, another caller already rotated; use the freshly-read token (or retry refresh once if it's also expiring). If unchanged, the refresh_token is permanently rejected and the workspace needs re-onboarding. Implemented in both the TS resolver (linear-oauth-resolver.ts) and Python resolver (config.py). - Unguarded bedrock_agentcore import in agent/src/server.py (review critical). The bare `from bedrock_agentcore.runtime.context import BedrockAgentCoreContext` inside `_run_task_background` killed the entire pipeline thread with no diagnostic if the SDK was missing or its module structure changed. Wrap in try/except (ImportError, AttributeError) and log via _warn_cw — the Linear token resolver has its own SM fallback, so the agent can proceed without the workload-token bridge. - Cache invalidation on fetch-level refresh failure (review high). The TS resolver's `invalidateLinearOauthCache()` only ran in the `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the catch returned null without invalidating, leaving the stale expiring token cached for 60s and hammering Linear's token endpoint. Move invalidate into the fetch-level catch too. - Malformed expires_at log (review medium). The Python `_is_expiring` caught `ValueError` and silently returned True, masking consistently-bad writes. Add a WARN log so operators see the bad data instead of just an unexplained refresh on every task. - Positive-path refresh log (review non-blocking aws-samples#5). Added INFO-level breadcrumb on successful refresh in both resolvers so operators diagnosing intermittent 401s have a trace of which workspace refreshed and to what expiry. 11/11 existing resolver unit tests still pass; will add tests for the new race-recovery branch in a followup commit.
isadeks
added a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 26, 2026
Addresses the blocker + critical items from PR review: - Refresh-token race (review blocker). Linear rotates refresh_tokens on every use; concurrent Lambdas/agents racing the same secret will all read the same expiring token and one's refresh will succeed while the others get `invalid_grant`. On `invalid_grant`, re-read the secret from Secrets Manager (bypassing cache). If the refresh_token has changed, another caller already rotated; use the freshly-read token (or retry refresh once if it's also expiring). If unchanged, the refresh_token is permanently rejected and the workspace needs re-onboarding. Implemented in both the TS resolver (linear-oauth-resolver.ts) and Python resolver (config.py). - Unguarded bedrock_agentcore import in agent/src/server.py (review critical). The bare `from bedrock_agentcore.runtime.context import BedrockAgentCoreContext` inside `_run_task_background` killed the entire pipeline thread with no diagnostic if the SDK was missing or its module structure changed. Wrap in try/except (ImportError, AttributeError) and log via _warn_cw — the Linear token resolver has its own SM fallback, so the agent can proceed without the workload-token bridge. - Cache invalidation on fetch-level refresh failure (review high). The TS resolver's `invalidateLinearOauthCache()` only ran in the `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the catch returned null without invalidating, leaving the stale expiring token cached for 60s and hammering Linear's token endpoint. Move invalidate into the fetch-level catch too. - Malformed expires_at log (review medium). The Python `_is_expiring` caught `ValueError` and silently returned True, masking consistently-bad writes. Add a WARN log so operators see the bad data instead of just an unexplained refresh on every task. - Positive-path refresh log (review non-blocking aws-samples#5). Added INFO-level breadcrumb on successful refresh in both resolvers so operators diagnosing intermittent 401s have a trace of which workspace refreshed and to what expiry. 11/11 existing resolver unit tests still pass; will add tests for the new race-recovery branch in a followup commit.
isadeks
added a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 26, 2026
… 2.0b) (aws-samples#160) * feat(linear): resolve API token via AgentCore Identity (Phase 2.0a) Migrates the agent runtime's Linear personal API token resolution from AWS Secrets Manager to AWS Bedrock AgentCore Identity. This is the "validate Identity SDK" step of the v2 plan; Phase 2.0b will swap the API key for OAuth and converge Linear MCP onto AgentCore Gateway in one cutover. Per Alain's guidance: "start by using api key, if it works, switch to oauth. you will setup an outbound auth for your server using agentcore identity. that identity can be (AC identity is like a wrapper around secrets manager) api key or oauth." Lambdas (orchestrator + processor) intentionally keep using Secrets Manager via the existing `LinearApiTokenSecret` for now. The Python `bedrock_agentcore` SDK has no Node.js equivalent — Lambda migration requires `@aws-sdk/client-bedrock-agentcore` raw API calls and folds into 2.0b's bigger refactor. End-state of 2.0a: agent reads from Identity, Lambdas read from Secrets Manager, both pointing at the same underlying token value (admin populates both). `agent/src/config.py::resolve_linear_api_token`: - Drops boto3 SecretsManager fetch + `LINEAR_API_TOKEN_SECRET_ARN` env. - Reads new env `LINEAR_API_KEY_PROVIDER_NAME` (provider name in Identity vault). - Calls `IdentityClient.get_api_key()` with the workload access token auto-injected into `BedrockAgentCoreContext` by AgentCore Runtime (verified by reading the SDK's `auth.py` decorator implementation — no manual workload-identity mint needed inside the runtime). - Caches the resolved token in `LINEAR_API_TOKEN` so downstream consumers stay unchanged: `channel_mcp.py`'s `${LINEAR_API_TOKEN}` placeholder in `.mcp.json` and `linear_reactions.py`'s GraphQL Authorization header. Preserves PR aws-samples#87's nice-to-have improvements: - `ImportError` graceful fallback (now for `bedrock_agentcore` instead of `boto3`) — degrade with WARN, don't crash the agent. - `AccessDeniedException` and `ResourceNotFoundException` logged at ERROR severity (persistent IAM/config bugs that should page). Other ClientErrors stay at WARN (transient throttle/network). `agent/pyproject.toml`: adds `bedrock-agentcore==1.9.1` dep. `cdk/src/stacks/agent.ts`: - On the AgentCore runtime: drops `linearIntegration.apiTokenSecret. grantRead(runtime)` and the `LINEAR_API_TOKEN_SECRET_ARN` env-var override. Adds `LINEAR_API_KEY_PROVIDER_NAME` env (hardcoded `'linear-api-key'` for now; can parametrize later via context if multi-environment naming is needed) and IAM permissions for `bedrock-agentcore:GetResourceApiKey` and `bedrock-agentcore:GetWorkloadAccessToken`. - Lambdas (orchestrator + processor) untouched — they still grant on the Linear secret and read from Secrets Manager. - Resource scope on the new IAM is `*` for now; AgentCore Identity ARN format isn't fully standardized in public docs as of 2026-05-15. Tighten in 2.0b when OAuth migration documents the canonical resource shape. `docs/guides/LINEAR_SETUP_GUIDE.md`: adds Step 4.5 documenting the one-time `agentcore add credential --type api-key --name linear-api-key` admin command users must run alongside the existing `bgagent linear setup` wizard. Notes that Lambdas keep Secrets Manager temporarily and 2.0b will retire the dual-store setup. Starlight mirror synced. `agent/tests/test_config.py::TestResolveLinearApiToken` — 10 tests covering: cached env var fast-path; missing provider name; missing region; workload token absent (outside runtime); happy path with env-var side-effect; botocore error swallowed with WARN; SDK returns None defensively; ImportError fallback; AccessDeniedException → ERROR severity; ResourceNotFoundException → ERROR severity. 542 agent / 1271 cdk / 196 cli, all green. Lint + typecheck clean. CDK synth clean. `bedrock_agentcore` SDK confirmed working in our runtime image (verified in `node_modules` post-install). The `BedrockAgentCoreContext` workload token auto-injection is documented behaviour for code running inside AgentCore Runtime — verified by reading the SDK's `@requires_api_key` decorator implementation, which uses the same context lookup we use here. Stacked on PR aws-samples#87 (`feat/linear-processor-feedback`). Will conflict on `config.py` and `test_config.py` if aws-samples#87 needs further rework before merge — happy to rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(linear): use aws CLI for credential provider, not the agentcore command The setup guide referenced `agentcore add credential` which doesn't actually work end-to-end: - The Python `bedrock-agentcore-starter-toolkit` CLI (`agentcore`) only exposes agent-lifecycle commands; there is no `credential-provider` subcommand. Confirmed by reading the toolkit's CLI reference and by user trying `agentcore configure credential-provider --type api-key --name ...` and receiving `No such command 'credential-provider'`. - The new npm `@aws/agentcore` CLI does have `agentcore add credential` but uses a declarative project model — the credential lands in `agentcore.json` + `.env.local`, not the actual AgentCore Identity vault, until `agentcore deploy` runs against a project structured for that CLI. ABCA isn't structured that way. Switch the docs to the plain AWS CLI which works directly against the AgentCore Identity API: aws bedrock-agentcore-control create-api-key-credential-provider \ --name linear-api-key \ --api-key "<paste lin_api_… token here>" \ --region us-east-1 Plus the matching `list-api-key-credential-providers` for verification. Add a "Tooling note" at the bottom of the section explaining why the plain AWS CLI is the right path here vs. the two `agentcore` CLIs. Starlight mirror synced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): pass runtimeUserId so AgentCore injects WorkloadAccessToken Smoke on backgroundagent-dev caught a real bug in the Phase 2.0a migration: the agent's `resolve_linear_api_token()` was correctly calling `IdentityClient.get_api_key()` but failing earlier at `BedrockAgentCoreContext.get_workload_access_token()` returning None. The Linear MCP then loaded with an unresolved `${LINEAR_API_TOKEN}` placeholder and 👀 didn't post. Root cause (from reading bedrock-agentcore-sdk-python source): The `WorkloadAccessToken` request header (which the runtime container reads to populate `BedrockAgentCoreContext`) is only injected by AgentCore Identity when `InvokeAgentRuntimeCommand` is called with `runtimeUserId`. Per AWS docs at https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-oauth.html: "Agent Runtime exchanges this token for a Workload Access Token via bedrock-agentcore:GetWorkloadAccessTokenForJWT API and delivers it to your agent code via the payload header `WorkloadAccessToken`." Without `runtimeUserId`, AgentCore never derives a workload token and the header is absent. `app.py::_build_request_context` reads the header off the inbound request; the agent sees None. Fix: 1. Thread `userId` through the `ComputeStrategy.startSession` interface (compute-strategy.ts). 2. Pass `task.user_id` (the task's Cognito sub) at the call site in orchestrate-task.ts. 3. Set `runtimeUserId: input.userId` on `InvokeAgentRuntimeCommand` in agentcore-strategy.ts. Log it alongside session_id for traceability. 4. ECS strategy accepts the new parameter to satisfy the interface; doesn't use it (ECS doesn't go through AgentCore Identity). 5. Grant the orchestrator role `bedrock-agentcore:InvokeAgentRuntimeForUser` alongside `InvokeAgentRuntime` (task-orchestrator.ts). Without this, the new `runtimeUserId` parameter would 403. Tests updated: - `agentcore-strategy.test.ts`: pin that `runtimeUserId` flows from input into the SDK command; pass `userId: 'cognito-user-1'` in 4 call sites. - `ecs-strategy.test.ts`: pass `userId` (unused by ECS) on 3 call sites. - `start-session-composition.test.ts`: pass `userId: 'cognito-test'` on 3 call sites. - `task-orchestrator.test.ts`: assert the IAM action list includes `InvokeAgentRuntimeForUser` (2 assertions). 542 agent / 1273 cdk / 196 cli — all green. Lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): two undocumented gotchas to make AgentCore Identity actually work End-to-end smoke on backgroundagent-dev surfaced two more silent failure modes after the runtimeUserId fix landed: `BedrockAgentCoreContext.get_workload_access_token()` returned None inside the pipeline thread even though the platform delivered the token on the inbound request. Cause: Python ContextVar storage is per-thread, not shared across `threading.Thread` boundaries. Our `_run_task_background` spawns a new thread for the pipeline, so any context-var the SDK's middleware sets in the request handler thread doesn't reach it. Compounding factor: the SDK's `_build_request_context` middleware only runs when using `BedrockAgentCoreApp` from `bedrock_agentcore.runtime`. Plain FastAPI apps like ours never get that bridge at all. Fix: read the workload token off the request in `_extract_invocation_params` (handling both observed header spellings — `WorkloadAccessToken` and `x-amzn-bedrock-agentcore-runtime-workload-accesstoken`), thread it through the kwargs of `_run_task_background`, and have the pipeline thread call `BedrockAgentCoreContext.set_workload_access_token` on entry. (cdk/src/stacks/agent.ts) After (1) was applied, `IdentityClient.get_api_key()` actually fired and got `AccessDeniedException: ... not authorized to perform: secretsmanager:GetSecretValue`. Cause: AgentCore Identity stores api-key credentials in Secrets Manager under reserved prefix `bedrock-agentcore-identity!*` (the actual ARN shape: `arn:aws:secretsmanager:REGION:ACCOUNT:secret:bedrock-agentcore- identity!default/apikey/<provider-name>-<hash>`). The `GetResourceApiKey` control-plane API surfaces the underlying secret to the caller, and AWS verifies the *caller* role (our runtime role) has `GetSecretValue` on the actual secret resource — not the SLR. Fix: grant the runtime role `secretsmanager:GetSecretValue` scoped to the `bedrock-agentcore-identity!*` prefix in the current account/region. Tightly scoped to Identity-managed secrets; doesn't leak read access to other Secrets Manager resources. - Runtime container reads workload token from request, propagates across thread boundary, calls IdentityClient successfully - 👀 reaction posts at +525ms after task pickup, no warnings - Linear MCP loads cleanly with the resolved token - No more `workload access token not in context` WARN - No more `AccessDeniedException` from `GetResourceApiKey` Three undocumented requirements total for Phase 2.0a (combining with the runtimeUserId fix from the prior commit): 1. Caller (orchestrator) sends `runtimeUserId` and has `InvokeAgentRuntimeForUser` IAM 2. Runtime container bridges the workload-token header into the ContextVar, with per-thread propagation if the pipeline runs in a spawned thread 3. Runtime role has `secretsmanager:GetSecretValue` on `bedrock-agentcore-identity!*` All three are silent failures on their own; missing any one returns None or AccessDenied without obvious "you forgot X" diagnostics. Will file an upstream issue against `aws/bedrock-agentcore-sdk-python` summarising all three so others don't burn the same cycles. Tests: 542 agent / 1273 cdk / 196 cli — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b): foundation — workspace registry, admin invite-user, Linear app template Wave 1 of Phase 2.0b: prereq pieces for the Linear OAuth migration. - LinearWorkspaceRegistryTable: maps Linear org-id → AgentCore credential provider name, so webhook + orchestrator Lambdas can resolve the workspace's OAuth token without knowing about provider naming. - bgagent admin invite-user: wraps Cognito admin-create-user with the right defaults and prints a base64 bundle that --from-bundle decodes into ~/.bgagent/config.json. Replaces a four-flag dance with a single paste for joining teammates. - bgagent linear app-template: prints the Linear OAuth app form values captured from the 2.0b spike — GitHub username with [bot] suffix and Webhooks ON gate the actor=app flow; misleading "Invalid redirect_uri" error is the symptom when either is missing. - USER_GUIDE roles section + joining-an-existing-deployment flow: makes the four-role lifecycle explicit (stack admin / workspace admin / repo onboarder / teammate) so a teammate landing on the docs has a clear non-admin path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(linear): rewrite setup guide for OAuth (2.0b) Replace the personal-API-key flow with the Linear OAuth `actor=app` install path verified by the 2.0b spike. Major changes: - Step 1: AgentCore credential provider via `bgagent linear oauth-register-workspace`, capturing the AWS-hosted callback URL that Linear will actually see. - Step 2: Linear OAuth app creation via `bgagent linear app-template`, documenting the GitHub-username-with-[bot]-suffix and Webhooks-ON gates that produce Linear's misleading "Invalid redirect_uri" error when missing. - Step 4: OAuth dance via the rewritten `bgagent linear setup` — ephemeral localhost HTTPS callback; no own ALB/Lambda needed since AWS proxies the OAuth flow. - Step 7: clarify that the PAK-owner auto-link becomes the setup-runner auto-link; the manual DDB mapping path stays for now until self-service `@bgagent link` ships. - New "Adding additional Linear workspaces" section for multi-workspace deployments. - New "Migration from 2.0a (PAK) to 2.0b (OAuth)" runbook. - Troubleshooting expanded to cover the Invalid-redirect_uri and 401-from-Linear scenarios surfaced in the spike. Notes the docs reference commands shipping in Wave 2 (aws-samples#63 oauth-register-workspace, aws-samples#65 setup wizard, aws-samples#67 add-workspace) — the 2.0b branch is a coherent unit and aws-samples#62 must land before those flows are wired so the docs aren't a moving target during implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): workload-token retrieval helper for AgentCore Identity (2.0b C1) The CLI runs OUTSIDE AgentCore Runtime, so the in-container ContextVar trick from 2.0a does not apply. This module gives every 2.0b OAuth-flow command a single way to obtain a workload access token: - getWorkloadAccessToken({region, workloadName, userId}) calls the data-plane GetWorkloadAccessTokenForUserId, scoping the resulting token to (workload, cognito_sub) so OAuth-token retrieval is per platform user. - decodeCognitoSub(idToken) extracts the sub claim from the cached id_token, parsing only — token validation is API Gateway's job. - DEFAULT_CLI_WORKLOAD_NAME is the deployment-time convention; the workload identity itself will be created by a follow-up CDK custom resource (aws-samples#61). Stack output 'CliWorkloadIdentityName' wires the CLI to whatever the deployed name actually is. Two SDK errors get translated into actionable remediation hints: - ValidationException: WorkloadIdentity is linked to a service — documented footgun from the spike, surfaces when the CLI is pointed at a runtime workload. - AccessDeniedException / ResourceNotFoundException — same surface treatment, with the bgagent-side checklist embedded in the message. Adds @aws-sdk/client-bedrock-agentcore + bedrock-agentcore-control as CLI deps. Pins all CLI AWS SDK clients to 3.1024.0 (matching) to keep the @smithy/core dependency graph deduplicated; mixed-version pins caused interface-collision typecheck errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): bgagent linear oauth-register-workspace (2.0b B2) Registers a Linear workspace as an AgentCore OAuth2 credential provider. The command: - Validates the workspace slug shape ([a-zA-Z0-9_-]{4,50}) so the resulting provider name fits AgentCore's 64-char limit. - Prompts for clientId + clientSecret (interactive, not echoed). - Calls CreateOauth2CredentialProvider with credentialProviderVendor= 'CustomOauth2' and explicit authorizationServerMetadata for Linear's fixed endpoints (Linear has no .well-known/openid-configuration, so vendor-discovery cannot auto-resolve). - Prints the AWS-hosted callback URL the operator pastes into Linear's app form — the AWS-side proxy that Linear actually redirects to. - Idempotent: re-running with an existing provider name fetches the callbackUrl and reports "already exists — re-using it". Smoke test against dev account (2026-05-19) revealed AWS surfaces the duplicate-name case as ValidationException (NOT ConflictException as CFN/REST conventions would suggest). Detection is by message-substring match; tests cover both the duplicate path and the "ValidationException for a non-duplicate reason" path so we don't accidentally swallow input validation errors. AccessDeniedException gets a remediation hint pointing at the 'bedrock-agentcore:CreateOauth2CredentialProvider' permission, since the most common misconfiguration is running the command as a Cognito-authenticated CLI user (no permissions) rather than as an admin/stack-deploy IAM principal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b): CLI workload identity + localhost OAuth callback server (A3) Two pieces that together let the CLI run the OAuth dance without any externally-facing infrastructure: CDK side (CliWorkloadIdentity construct, wired into the agent stack): - Creates a dedicated AgentCore Identity workload identity named `bgagent-cli`, distinct from the runtime workload (which is service- linked and cannot mint user-scoped tokens). - Allowlists `https://localhost:8443/oauth/callback` as a permitted resourceOauth2ReturnUrl. AgentCore validates browser-redirect URLs against this list, so the CLI cannot finish the OAuth dance without it. - Implementation: AwsCustomResource (no L2/L1 for AgentCore Identity in CDK as of May 2026). Idempotent — Create/Update/Delete lifecycle wired so re-deploys reconcile the allowlist and stack-deletes don't leak workload identities (50/account-region quota). - Stack outputs `CliWorkloadIdentityName` and `LinearWorkspaceRegistryTableName` so the CLI can discover them at runtime. CLI side (oauth-callback-server module): - Generates a fresh self-signed cert in /tmp via openssl on each invocation; cert is cleaned up when the server shuts down. - Starts an HTTPS listener on localhost:8443/oauth/callback, captures the first request's `session_id` query param, renders a success page, shuts down. Uses res.once('finish') to ensure the response body flushes before the listener closes — otherwise the browser hangs waiting for bytes that never arrive (caught by integration test). - Translates EADDRINUSE and timeout into actionable CliErrors. The CLI URL constant and the CDK default allowlist must agree on the exact URL string — drift would silently break the OAuth dance with "redirect_uri not allowlisted". A regression-locking test on the URL constant + matching CDK default flags the issue at unit-test time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): bgagent linear setup OAuth dance orchestration (2.0b C2/C3) Replaces the personal-API-key wizard with a 7-step OAuth flow that authorizes a Linear workspace via AgentCore Identity: 1. Resolve stack outputs (CliWorkloadIdentityName, registry table, user mapping table, webhook secret ARN). Errors loudly if any are missing — typically means the stack predates 2.0b. 2. Read Cognito sub from cached id_token. 3. Mint workload access token via getWorkloadAccessTokenForUserId. 4. Initiate OAuth dance: getResourceOauth2Token returns an authorize URL + sessionUri. customParameters: {actor: 'app'} propagates so Linear surfaces the Agent install variant of the consent screen (verified via 2.0b spike). 5. Start localhost HTTPS callback server, open browser to the auth URL, await session_id from the callback. 6. Poll getResourceOauth2Token (5s/600s) until accessToken arrives; translate sessionStatus=FAILED into a Linear-app-config remediation hint. 7. Query Linear viewer + organization with the OAuth token, persist the workspace registry row + admin user-mapping row, then prompt for the webhook signing secret if not already configured. Hard cutover from PAK: the new wizard is OAuth-only — there is no --use-pak flag. The webhook signing secret prompt remains because HMAC verification of inbound Linear webhooks is independent of how the agent calls Linear outbound. Webhook prompt is skipped on subsequent add-workspace runs by detecting the lin_wh_ prefix on the stored secret; --rotate-webhook-secret forces a re-prompt. Splits queryLinearIdentity out so both the legacy PAK auto-link helper (authorization=`lin_api_…`) and the OAuth path (authorization= `Bearer <token>`) reuse the same GraphQL query. The PAK helper stays exported to support the legacy linkage path until LinearApiTokenSecret is retired in aws-samples#70. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cdk): use full SDK v3 package name for AgentCore Identity custom resource CDK's AwsCustomResource auto-derives the SDK package name from `service` by lowercasing and dropping hyphens — `'BedrockAgentCoreControl'` becomes `@aws-sdk/client-bedrockagentcorecontrol`, which doesn't exist. The actual package is `@aws-sdk/client-bedrock-agentcore-control` (hyphens). Verified by deploy: with the lowercased mapping the Lambda backing the CR fails with "Package @aws-sdk/client-bedrockagentcorecontrol does not exist" and the stack rolls back. Switching to the full v3 package name (supported per the AwsSdkCall.service jsdoc) routes the import correctly. Verified end-to-end: `bgagent-cli` workload identity created with `https://localhost:8443/oauth/callback` on the resourceOauth2ReturnUrls allowlist, stack outputs populated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(2.0b): park diagnostic flags + tokenEndpointAuthMethods + force-reauth Smoke test against backgroundagent-dev (2026-05-19) hit a service-side bug in AgentCore Identity: USER_FEDERATION token-exchange against Linear's /oauth/token never completes. sessionStatus stays IN_PROGRESS indefinitely, no FAILED transition, no diagnostics on the wire. Verified via manual curl that Linear's token endpoint works perfectly with the same clientId/secret/scopes/code/actor=app — bug is on AWS's side. AgentCore Identity has zero token-injection APIs, so Option 3 (do OAuth ourselves + inject) is architecturally impossible. AWS support case + PAR-compatibility upstream issue aws/bedrock-agentcore-sdk-python#111 are the official fix paths. Parking the wizard work but committing the diagnostic flags we added during triage so they're available when this is unparked: - `tokenEndpointAuthMethods: ['client_secret_post']` on the provider metadata. Linear expects POST-body credentials; AgentCore defaults to Basic. Field name verified against the SDK type (`tokenEndpointAuthMethods`, not the `Supported` suffix the boto3 reference suggested). - `--verbose-poll` flag on `bgagent linear setup` — prints per-poll sessionStatus + response keys so the stuck state is visible. - `--force-reauth` flag — sets `forceAuthentication: true` on GetResourceOauth2Token to bypass cached tokens after a Linear-side revoke. - `CompleteResourceTokenAuth` call between callback capture and poll loop. Per AWS sample 09-Outbound_Auth_Self_Hosted, this is required to bind the captured session to a userId. Confirmed it's NOT what unblocks our specific bug, but is correct per spec for any USER_FEDERATION flow. Status of resume paths in memory/project_oauth_2_0b.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b-O2): direct Linear OAuth + per-workspace Secrets Manager Replaces the parked AgentCore Identity OAuth flow with a CLI-side direct OAuth dance against Linear's /oauth/token endpoint. The flow verified by the manual curl smoke test on 2026-05-19 returned a valid access_token in <100ms, so we know Linear's side works. AWS's USER_FEDERATION wrapper is broken specifically for Linear (or actor=app); see memory/project_oauth_2_0b.md for the parked-bug details and resume prompt. Architecture: - New module cli/src/linear-oauth.ts owns the OAuth helpers: generatePkce (S256), buildAuthorizationUrl (with actor=app), exchangeAuthorizationCode, refreshAccessToken, StoredLinearOauthToken JSON shape, computeExpiresAt, isAccessTokenExpiring (60s threshold), linearOauthSecretName. 19 hermetic tests (no network). - Per-workspace Secrets Manager secret bgagent-linear-oauth-<slug> holds the token JSON. CLI creates+updates at runtime via upsertOauthSecret (CreateSecret + ResourceExistsException → PutSecretValue fallback). - LinearWorkspaceRegistryTable row gains oauth_secret_arn. Lambdas resolve workspace → secret_arn → token JSON, with refresh-if-expiring. (Lambda migration is Wave C.) - bgagent linear setup is rewritten end-to-end: prompt-for-credentials → PKCE → open browser → callback captures ?code+?state → state verify (CSRF) → exchangeAuthorizationCode → query Linear viewer+org → write secret + registry row + user mapping → webhook secret prompt (unchanged from prior wizard). No AgentCore calls. No polling. No CompleteResourceTokenAuth. - Localhost callback server now exposes both AgentCore-style (session_id) and direct-Linear-style (code+state+error) shapes via a CallbackResult with nullable fields. Backward-compat with the parked AgentCore path's tests. Removals: - cli/src/agentcore-identity.ts + test (workload-token helper) - cdk/src/constructs/cli-workload-identity.ts + test (workload identity) - providerNameForWorkspace, buildLinearProviderInput, registerLinearWorkspace, initiateOauthDance, completeResourceTokenAuth, pollForOauthAccessToken, AgentCore SDK imports — all gone from linear.ts - bgagent linear oauth-register-workspace command (no AWS-side provider to register; folded into setup) - CliWorkloadIdentityName CfnOutput from agent.ts - 6 describe blocks of AgentCore-flavored tests in linear.test.ts Net change: -1100 lines, +700 lines of new direct-OAuth wiring. 286/286 CLI tests pass. 9/9 linear-integration CDK tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(2.0b-O2): space-separated OAuth scopes + --no-actor-app diagnostic End-to-end smoke test against backgroundagent-dev (2026-05-20): - The OAuth dance was failing with Linear's "Invalid redirect_uri" error even though the redirect_uri was correct. Root cause: scopes were comma-separated (`read,write,...`) instead of space-separated. RFC 6749 §3.3 mandates space; Linear surfaces the violation as the misleading "Invalid redirect_uri" error, the same misdirection we hit during the 2.0b spike. Fix: `.join(' ')` in buildAuthorizationUrl. - Adds `--no-actor-app` diagnostic flag on `bgagent linear setup`. Drops the `actor=app` query param so a stuck flow can be isolated to agent-install vs vanilla-OAuth without changing the Linear app config. Off by default; surfaces a warning when invoked. After the fix, full smoke test passed: - Browser opens to Linear consent - User authorizes, redirects to https://localhost:8443/oauth/callback - CLI captures code+state, exchanges for access_token + refresh_token - Token JSON persisted to bgagent-linear-oauth-maguireb in Secrets Manager - LinearWorkspaceRegistryTable row written with oauth_secret_arn - LinearUserMappingTable row written for the admin - Token verified against Linear's GraphQL viewer query (works) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b-O2): Wave C — migrate Lambdas + agent runtime to per-workspace OAuth Replaces every consumer of the legacy LinearApiTokenSecret PAK with the per-workspace Secrets Manager OAuth-token pattern from Waves A/B. Deploy of this commit will fully cut over the integration; the LinearApiTokenSecret construct is gone. CDK side: - New `cdk/src/handlers/shared/linear-oauth-resolver.ts` resolves workspace_id → registry row → oauth_secret_arn → token JSON → refresh-if-expiring → access_token. In-memory caches (1m TTL) on both registry rows and token JSON. Lazy refresh with PutSecretValue write-back so concurrent Lambdas see the rotated token. 11 unit tests. - linear-feedback.ts: postIssueComment / addIssueReaction / reportIssueFailure now take a {linearWorkspaceId, registryTableName} context instead of an apiTokenSecretArn. Auth header switches from bare PAK value to `Bearer ${accessToken}`. - linear-webhook-processor.ts: env vars LINEAR_WORKSPACE_REGISTRY_TABLE_NAME replace LINEAR_API_TOKEN_SECRET_ARN. safeReportIssueFailure threads the webhook payload's organizationId through to the resolver. Webhook processor now stamps `linear_oauth_secret_arn` + `linear_workspace_slug` into channel_metadata at task-creation time so the agent runtime can fetch the secret directly without a registry round-trip. - orchestrate-task.ts: notifyLinearOnConcurrencyCap reads LINEAR_WORKSPACE_REGISTRY_TABLE_NAME and the task's channel_metadata.linear_workspace_id. - LinearIntegration construct: drops apiTokenSecret + ApiTokenSecret Secrets Manager resource entirely. Webhook processor IAM now grants Get+Put on `bgagent-linear-oauth-*` Secrets Manager prefix. - Agent stack: orchestrator IAM mirrors the new prefix grant. Runtime IAM drops AgentCore Identity grants and gains Get+Put on `bgagent-linear-oauth-*`. LINEAR_API_KEY_PROVIDER_NAME env var, LINEAR_API_TOKEN_SECRET_ARN env var, and LinearApiTokenSecretArn CfnOutput all removed. Agent side (Python): - config.py::resolve_linear_api_token: rewritten to read the per-task channel_metadata.linear_oauth_secret_arn (or LINEAR_OAUTH_SECRET_ARN env fallback) via boto3.secretsmanager. Lazy refresh: if expires_at is within 60s, POST refresh_token grant to Linear /oauth/token using client_id/client_secret co-located in the secret JSON, write the rotated token back via put_secret_value, return the new access_token. - pipeline.py: passes config.channel_metadata into resolve_linear_api_token. - linear-oauth.ts (CLI): StoredLinearOauthToken schema gains client_id + client_secret fields so Lambda + agent refresh can run without per-Lambda OAuth env vars. Setup wizard writes them. Tests pruned of AgentCore Identity mocks; new tests cover the Secrets-Manager-direct path (CDK 11 + agent 6 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(orchestrator): bundle import.meta.url shim for durable-execution SDK `@aws/durable-execution-sdk-js@1.1.3`'s ESM build calls `fileURLToPath(import.meta.url)` at module load. esbuild's ESM→CJS bundling leaves `import.meta.url` undefined, crashing every invocation with `TypeError: path must be a string`. Define an identifier substitution + banner that materializes a valid file:// URL from `__filename` at runtime. Discovered while smoke-testing Wave C end-to-end on backgroundagent-dev. Refs: aws/aws-durable-execution-sdk-js#543 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): switch OAuth callback to plain HTTP localhost Per RFC 8252 §7.3, OAuth providers (including Linear) treat http://localhost as a special case that doesn't need TLS — the connection never leaves the host. The previous self-signed-cert HTTPS approach forced testers through a "connection not private" warning that scared them off mid-setup. Drops the openssl shell-out + temp-cert plumbing (~60 lines) along with the user-facing warning copy in `bgagent linear setup`. Updates the callback constants to http://localhost:8080/oauth/callback and the test suite to plain http.GET. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(agent): ruff lint + format for OAuth refresh path Five lint errors surfaced when CI ran `ruff check --fix` against the Wave C agent changes: - F401 unused `timezone` import in `config.py` (replaced with `timedelta`, which is what's actually needed) - RUF034 useless if-else in the `expires_at` ternary — both branches returned identical strings before the recompute below; flatten into a single straightforward `if expires_in: ... else: ...` block - E501 three line-length violations in `config.py` and `test_config.py` — break the long expressions onto helper-named intermediates Confirmed locally: `ruff check .` clean, `ruff format --check` clean, `pytest tests/test_config.py` 15/15 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(linear-feedback): rewrite tests against OAuth context signature Wave C migrated postIssueComment / addIssueReaction / reportIssueFailure from a (secretArn: string, ...) signature to a (ctx: LinearFeedbackContext, ...) signature, but the test file still passed bare strings — TypeScript caught it at compile time only when CI ran a full build. Three test suites failed to compile (the typecheck error blocked the whole suite, not just this file). - Mock `resolveLinearOauthToken` (the new resolver) instead of `getLinearSecret` (the old PAK fetcher). - Build a `LinearFeedbackContext` fixture with linearWorkspaceId + registryTableName, pass it everywhere SECRET_ARN was used. - Update the Authorization-header assertion to match the new `Bearer <token>` form (PAK was bare-token; OAuth is Bearer-prefixed). All 41 tests across linear-feedback, linear-webhook-processor, and orchestrate-task-feedback pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cdk): align yarn.lock with upstream main + bump table count for OAuth registry Two CI failures came together because they share a root cause: this branch's yarn.lock had drifted from upstream main during interim re-resolves, leaving an inconsistent dep tree that broke ts-jest's module resolution for @aws-cdk/mixins-preview/aws-bedrockagentcore. Restoring upstream main's yarn.lock fixes the resolution; the agent.test.ts table-count assertion then needs to bump from 12 to 13 to account for the LinearWorkspaceRegistryTable added in Phase 2.0b Wave A4. Verified locally: agent.test.ts (44/44) and github-tags.test.ts (5/5) both pass after the changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style: apply eslint --fix from CI's self-mutation guard CI runs `mise run build`, which invokes `eslint --fix` and then fails if the working tree changed (self-mutation guard). Three cosmetic lints needed applying: - Import-order: DynamoDBClient and CliError moved earlier in their files to satisfy alphabetic-by-package ordering - formatJson import added in alphabetic position in linear.ts - Three template literals with no interpolation converted to single-quoted strings in oauth-callback-server.ts and linear.ts (eslint quotes rule prefers single-quotes when no template variables are used) Pure mechanical fixes; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(oauth): refresh-token race recovery + log gaps from review Addresses the blocker + critical items from PR review: - Refresh-token race (review blocker). Linear rotates refresh_tokens on every use; concurrent Lambdas/agents racing the same secret will all read the same expiring token and one's refresh will succeed while the others get `invalid_grant`. On `invalid_grant`, re-read the secret from Secrets Manager (bypassing cache). If the refresh_token has changed, another caller already rotated; use the freshly-read token (or retry refresh once if it's also expiring). If unchanged, the refresh_token is permanently rejected and the workspace needs re-onboarding. Implemented in both the TS resolver (linear-oauth-resolver.ts) and Python resolver (config.py). - Unguarded bedrock_agentcore import in agent/src/server.py (review critical). The bare `from bedrock_agentcore.runtime.context import BedrockAgentCoreContext` inside `_run_task_background` killed the entire pipeline thread with no diagnostic if the SDK was missing or its module structure changed. Wrap in try/except (ImportError, AttributeError) and log via _warn_cw — the Linear token resolver has its own SM fallback, so the agent can proceed without the workload-token bridge. - Cache invalidation on fetch-level refresh failure (review high). The TS resolver's `invalidateLinearOauthCache()` only ran in the `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the catch returned null without invalidating, leaving the stale expiring token cached for 60s and hammering Linear's token endpoint. Move invalidate into the fetch-level catch too. - Malformed expires_at log (review medium). The Python `_is_expiring` caught `ValueError` and silently returned True, masking consistently-bad writes. Add a WARN log so operators see the bad data instead of just an unexplained refresh on every task. - Positive-path refresh log (review non-blocking aws-samples#5). Added INFO-level breadcrumb on successful refresh in both resolvers so operators diagnosing intermittent 401s have a trace of which workspace refreshed and to what expiry. 11/11 existing resolver unit tests still pass; will add tests for the new race-recovery branch in a followup commit. * fix(2.0b-O2): review-2 batch — error specificity, half-creates, runbook Addresses four PR review items focused on operator UX when things go sideways: - isWebhookSecretConfigured (review high). The bare `catch { return false }` swallowed AccessDeniedException and DecryptionFailureException, making setup re-prompt for a webhook secret when the real problem was IAM. Now: only ResourceNotFoundException returns false; everything else throws a CliError pointing the operator at the IAM gap. Test updated to assert both paths. - admin invite-user half-create (review medium). If AdminCreateUser succeeds but AdminSetUserPassword fails (stricter password policy than generator, partial IAM grant on the Set verb), the user was left in FORCE_CHANGE_PASSWORD with no diagnostic. Wrap the second call in try/catch and throw a CliError that names the user, explains the broken state, and gives both a delete-user CLI and a manual-fix path. - PAK migration runbook (review non-blocking #1). Expanded the "Migration from 2.0a (PAK) to 2.0b (OAuth)" section in LINEAR_SETUP_GUIDE.md with: a pre-deploy checklist, what survives the migration vs what doesn't, an explicit rollback note (fix forward; the original PAK secret is gone with the CFN resource), and the per-step difference between 2.0a-with-Identity (skipped) vs 2.0a-with-PAK (migrate) deploys. - Vestigial AgentCore Identity dep (review non-blocking #2). bedrock-agentcore==1.9.1 is kept in agent/pyproject.toml because the workload-token bridge in server.py still calls it (now wrapped in try/except per review batch 1). Add an inline comment explaining why it's pinned even though Phase 2.0b-O2 reads Secrets Manager directly — it's the seam for resuming the AgentCore Identity path in 2.0c. CLI tests: 13/13 pass. * fix(2.0b-O2): batch 3 — schema validation, CallbackResult union, parity contract Three remaining substantive review items from PR aws-samples#160: - Validate all 11 fields in getOauthSecret (review non-blocking aws-samples#7). Was checking only access_token / refresh_token / expires_at; missing client_id or client_secret only surfaced 24h later when the refresh call needed them and found undefined. Extracted the required-field list into a const next to the StoredOauthToken interface and check the full set at deserialization. Bad secrets fail fast at fetch time with a structured log line naming the missing fields. - CallbackResult discriminated union (review non-blocking aws-samples#6). Was `{ sessionId: string|null, code: string|null, state: string|null }` which let callers construct unreachable shapes. Split into `{ kind: 'agentcore', sessionId } | { kind: 'direct-oauth', code, state }`. Updated the resolver site (`oauth-callback-server.ts`), the consumer (`bgagent linear setup`), and the test file to use exhaustive type-narrowing. The setup wizard now errors clearly if it gets the agentcore shape (parked path) instead of silently passing nulls down. - Cross-language schema-parity contract test (review non-blocking #3). CLI's StoredLinearOauthToken and Lambda's StoredOauthToken define the same JSON-in-Secrets-Manager schema independently; drift between the two would be a silent bug (CLI writes one field name, Lambda reads another, refresh works, every Lambda invocation logs a missing-field error). New test in `cdk/test/contracts/stored-oauth-token-parity.test.ts` regex-parses both interface definitions out of source and asserts the field set is equal. Also asserts the new `STORED_OAUTH_TOKEN_REQUIRED_FIELDS` const matches the interface, so future field additions can't drift between the validator and the type. CLI tests 286/286 pass. CDK resolver + contract 13/13 pass. * style: apply eslint --fix indentation on CallbackResult union * fix(2.0b-O2): review-3 batch — defensive error handling, security hardening, refresh test coverage Bugs (B1-B3): - linear-oauth-resolver: try/catch around ddb.send() in getRegistryRow so transient DDB errors fail the resolver cleanly instead of crashing the Lambda thread - orchestrate-task: try/catch around reportIssueFailure in notifyLinearOnConcurrencyCap; Linear feedback failures must never block the rejection path - Python _fetch_token: guard json.loads + KeyError so a corrupted SM payload returns None (logged ERROR) rather than raising Tests (T1-T3): - Python: TestResolveLinearApiTokenRefreshPaths covering happy refresh, invalid_grant + concurrent rotation, invalid_grant + no rotation, malformed expires_at, network failure during refresh, and corrupted secret JSON - TS: concurrent-refresh recovery via re-read; permanent-rejection on same refresh_token; cache invalidation after network failure Security (S1-S3): - agent runtime IAM: drop secretsmanager:PutSecretValue on the Linear OAuth secret prefix. Untrusted repo code in the agent must not be able to overwrite tokens; Lambdas (trusted) handle persistence. The refreshed in-memory token still works for the current task; rotated refresh_token is lost on agent exit but Linear's grace window absorbs the rare race where the agent refreshes strictly before any Lambda - Python _try_refresh_once: narrow except Exception to (urllib.error.URLError, OSError) — programmer errors propagate with clean stack traces instead of being swallowed - linear-oauth-resolver: RegistryRowStatus is now a discriminated 'active' | 'revoked' literal; missing or unknown values fail closed to revoked rather than defaulting active * fix(2.0b-O2): use email.message.Message for HTTPError hdrs in tests ty's stricter typeshed flagged dict[Unknown, Unknown] passed where Message[str, str] is expected. Drop-in replacement that satisfies both ty and runtime. * fix(2.0b-O2): update feedback test for B2 — helper now swallows internally Round-3 review B2 moved the try/catch into notifyLinearOnConcurrencyCap. The pre-existing test asserted the old contract (rejection propagates, caller must catch). Flip the contract assertion to reflect the new behavior: the helper is now best-effort end-to-end and returns undefined on internal failure. --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
5 tasks
krokoko
pushed a commit
that referenced
this pull request
Jun 9, 2026
…ompt, cap, parity (#248) #5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload failure AFTER run_agent mutated/committed the tree stranded the work as FAILED with no PR. read_only now comes from config (already fallback-computed and the verdict that drove Cedar); the reload — needed only for the ensure_pr strategy — is wrapped in the same WorkflowValidationError fallback build_config uses, defaulting to "create" so the PR still opens. #8 — knowledge/web-research-v1 had no registered prompt and silently degraded to the generic default-agent prompt. Add a research-specialized, repo-less prompt (prompts/web_research.py) and register it. #9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap actually bounds memory on the MicroVM before materializing the bytes. #10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID / DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform default and be requires_repo:false in the YAML; the coding default must be repo-bound. Closes the last hand-maintained-copy axis #10 named. Perf — memoize load_workflow (@cache): files are image-baked and immutable per process and Workflow is frozen, so the 3-4 loads per task now parse once. Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a larger half-migrated-runner refactor both reviewers flagged as non-blocking.
krokoko
added a commit
to ayushtr-aws/sample-autonomous-cloud-coding-agents
that referenced
this pull request
Jun 10, 2026
…rkflows (aws-samples#248) (aws-samples#296) * Initial design docs * docs(workflows): address principal-review findings on ADR-014 - Validator parity: declare JSON Schema as the single canonical shape contract (consumed, not re-implemented), keep exactly one cross-field validator impl in Phases 1-3, and add a contracts/workflow-validation/ golden corpus to lock any Phase-4 second impl to parity (mirrors the cedar-parity mechanism). - Cedar principal migration: split into its own isolated PR (Phase 2a) reviewed alone with regenerated parity fixtures, landing ahead of the pr_iteration/pr_review workflow migrations (Phase 2b) that depend on it. - Repo-optionality: reframe web_research as a not-yet-runnable schema fixture (not an acceptance proof) and require the two blocking open questions (memory actorId, artifact delivery) to be resolved as a recorded ADR addendum before the Phase-0 schema is frozen. Regenerated Starlight mirrors. * feat(agent): workflow models + loader (aws-samples#248 Phase 1) Add the agent/src/workflow/ package: Pydantic models mirroring workflow.schema.json and a loader that resolves a first-party workflow id, parses YAML, and shape-validates against the canonical JSON Schema. Additive only — does not yet touch task_type (the breaking cutover lands later in Phase 1). - models.py: Workflow + sub-models; resolved_requires_repo applies the domain-derived default (coding=true, knowledge/hybrid=false). - loader.py: JSON-Schema shape validation (one canonical contract, consumed not re-implemented), YAML load, id/path agreement, and path-traversal defense. Cross-field rules deliberately live in the separate validator (Task #2), so there is one cross-field impl. - Promote pyyaml + jsonschema to direct deps (were only transitive); the loader must not depend on another package's transitive pin. - Fix a latent schema bug found by the new tests: the requires_repo/ read_only allOf conditionals fired vacuously when the property was absent, wrongly applying repo-less constraints to a coding workflow relying on the domain default. Guard each `if` with `required`. 17 new tests; agent ruff + ty + full suite (836) green. * feat(agent): cross-field workflow validator + golden corpus (aws-samples#248 Phase 1) The single CI-time implementation of the WORKFLOWS.md cross-field rules the JSON Schema cannot express (rules 1-9, 11, 12, 14). The runtime loader stays shape-only and trusts this verdict, so there is exactly one cross-field implementation in Phases 1-3 (avoids the cedar-style two-language drift). contracts/workflow-validation/ ships the golden parity corpus from Phase 1 so the expected-verdict contract is fixed before aws-samples#246 adds a second validator, mirroring contracts/cedar-parity/. test_workflow_tree_valid.py gates every first-party workflow file through the validator at CI time. * feat(agent): workflow step runner + handler registry (aws-samples#248 Phase 1) The agent-side step runner that interprets workflow.steps inside the container (ADR-014): a StepKind→handler registry, in-order execution honouring each step's on_failure policy (fail / continue / skip_remaining), per-step step:<name>:start/<status> progress milestones, and resume-aware checkpointing (deterministic side-effect-free steps recorded to workflow_state.json on the persistent mount are skipped on resume; agentic/side-effecting steps re-run idempotently per WORKFLOWS.md §Step execution semantics). Handlers are thin wrappers over the existing helpers (setup_repo, run_agent, verify_build/lint, ensure_pr) so this is maximal structural change, minimal logic change. post_review/deliver_artifact are registered but raise NotImplementedError (Phase 2b / Phase 3) — fail-loud, not silent no-op, keeping validator rule-8 handler parity honest. Orchestration core is unit-tested with fakes; wiring into pipeline.run_task is task 5. * feat(agent): first-party workflow files — coding/new-task-v1 + default/agent-v1 (aws-samples#248 Phase 1) Migrates the new_task task type to coding/new-task-v1.yaml (the heavyweight clone→build→open-PR coding path) and ships the platform default workflow default/agent-v1.yaml — the conservative last rung of the resolution ladder used when no workflow_ref and no Blueprint default apply (requires_repo:false, read-leaning tool set, deliver-as-comment). pr_iteration/pr_review migrations are Phase 2b (depend on the Cedar principal migration). Both pass the cross-field validator and the test_workflow_tree_valid CI gate (file path agrees with declared id; zero violations). * fix(agent): address code-review findings in workflow validator + runner (aws-samples#248 Phase 1) Fixes the group-A bugs surfaced by the high-recall code review of the Phase 1 commits: runner.py - Resume-skip product loss: clone_repo/hydrate_context populate in-memory products (ctx.setup, ctx.user_prompt) that can't be rebuilt from the JSON checkpoint, so skipping them on resume left ctx.setup None (breaking ensure_pr) and ctx.user_prompt empty (unguided agent). Narrow _RESUMABLE_SKIP_KINDS to verify_build/verify_lint — steps whose ENTIRE product is the checkpointed boolean re-applied via ctx.artifacts. clone_repo/hydrate_context now re-run on resume (handler-level idempotency), matching WORKFLOWS.md's intent. - Skipped steps now emit step:<name>:start / :skipped milestones so a watcher sees them accounted for instead of a gap. validator.py - _HANDLER_KINDS (rule 8) is now derived from runner.STEP_HANDLERS — the single source of truth — instead of a hand-copied list that could silently drift. - rule-6 tier ceiling now applies the elevated-only-fields check to read-only (the strictest tier) as well as standard; previously read-only could declare mcp_servers/plugins/skills unchecked. - rule-11 now verifies a deliver_artifact step's target actually produces the declared comment/artifact outcome, not merely that the step kind is present. Tests + corpus updated; rule3/rule7 fixtures switched to s3_and_comment target to stay isolated to their intended rule, plus a new rule11 target-mismatch fixture. Full agent suite green (909 passed), ruff + ty clean. * feat(agent): wire workflow runner into pipeline behind a gate (aws-samples#248 Phase 1, task 5) Builds the runner→pipeline seam and gates it off (default production behavior unchanged). pipeline.run_task's agent invocation now goes through _execute_agent_step: when WORKFLOW_RUNNER_ENABLED is set AND task_type maps to a shipped workflow (only new_task→coding/new-task-v1 in Phase 1), the single run_agent step is dispatched through workflow.run_workflow — exercising the real handler registry, step milestones, and result threading — while clone, context assembly, and post-hooks stay on the proven inline path. Otherwise it is exactly the legacy asyncio.run(run_agent(...)) call. pr_iteration/pr_review have no workflow file until Phase 2b, so they fall back to inline. run_workflow gains an only_kinds filter so the seam drives just the run_agent step (no double clone / double PR against the inline post-hooks). The flag is flipped to default-on in the tasks 6-8 cutover. Folds in the group-B code-review fixes (effective when the seam is enabled): - system prompt: hydrate_context builds ctx.system_prompt via the existing build_system_prompt (was empty), reusing pipeline's logic rather than reimplementing it. - verify gate: new shared _gate_status gives verify_build/verify_lint matching semantics — informational/read_only never gate, regression_only consults build_before/lint_before (mirrors pipeline's build_ok = passed or not build_before), strict always gates. Fixes the regression_only build bug and the verify_lint read_only asymmetry. - clone_repo is idempotent (reuses a pre-populated ctx.setup). - StepContext threads the --trace trajectory into run_agent (was dropped). ensure_pr strategy reconciliation stays deferred to task 8 (it requires removing the task_type branch inside ensure_pr); recorded in local-docs. Full agent suite green (930 passed), ruff + ty clean. * fix(agent): address second-round review findings on the workflow seam (aws-samples#248 Phase 1) - Error-handling contract (highest severity): when the run_agent step's handler raises, run_workflow captures it into a failed StepOutcome rather than propagating. _execute_agent_step now RE-RAISES in that case so run_task's except block restores full fidelity — the log_error_cw APPLICATION_LOGS mirror (read by TaskDashboard / bgagent status), the span error, and the real exception text — instead of silently downgrading to a generic AgentResult. This also makes the previously-dead agent_result-is-None branch live. - _gate_status: an unset gate now defaults to regression_only semantics (gate only a regression; a pre-existing failure does not gate), matching pipeline.py which is unconditionally 'build_ok = passed or not build_before'. Previously an unset gate fell through to strict, contradicting the docstring's 'mirrors pipeline.py' claim and would wrongly fail a broken-before repo. - run_agent handler fails loud on an empty system prompt rather than running an unguided agent loop (repo-less prompt assembly is Phase 3; this turns the gap into an attributable failed step instead of a silent context-free run). - _workflow_id_for_task_type: documented as a transitional bridge tied to the canonical task_type→workflow table in WORKFLOWS.md / API_CONTRACT.md, to be kept in sync until tasks 6-8 delete it. Tests added for each behavior change. Full agent suite green (934 passed), ruff + ty clean. * feat(api,cli): replace task_type with workflow_ref/resolved_workflow (aws-samples#248 tasks 6-7) Breaking API change (no alias): task_type is removed end-to-end; workflow_ref selects a workflow and resolves to a pinned {id, version} at the create-task boundary. CDK/CLI types (task 6, in lockstep — check:types-sync): - Remove TaskType + isPrTaskType; add ResolvedWorkflow. - CreateTaskRequest.workflow_ref?; TaskRecord/TaskDetail/TaskSummary resolved_workflow (+ mappers). - CLI: --pr/--review-pr now set workflow_ref (coding/pr-iteration-v1, coding/pr-review-v1); new --workflow flag; format reads resolved_workflow.id. CDK resolution boundary (task 7): - New workflows.ts: CDK-side descriptor mirror of agent/workflows/** + resolveWorkflowRef (ladder: explicit ref → default/agent-v1), isValidWorkflowRef, requires_repo/read_only/uses_pr accessors. Drift-guard test keeps the table in sync with the shipped YAML. - validation.ts: drop VALID_TASK_TYPES/isValidTaskType; hasTaskSpec now takes the resolved workflow's required_inputs contract. - create-task-core.ts: resolve workflow, validate inputs against the contract, persist workflow_ref + resolved_workflow. Repo stays required for ALL workflows in Phase 1 (repo-less admission is Phase 3). - preflight.ts: taskType param → readOnly + requiresRepo; repo-less short-circuits to passed (seam for Phase 3). - orchestrate-task/context-hydration/orchestrator/ecs-strategy: derive read_only/requires_repo/PR-ness from resolved_workflow; payload swaps task_type → resolved_workflow. ecs bootCommand passes resolved_workflow (lockstep with agent run_task in task 8). CLI 294 tests + CDK suite green; check:types-sync OK. Agent cutover (task 8) lands next to consume resolved_workflow. * feat(agent): complete task_type→workflow cutover in the agent (aws-samples#248 task 8) Removes the Python TaskType enum / PR_TASK_TYPES / _PROMPTS-by-task_type and the gated seam; the workflow runner is now the sole agentic path, driven by the resolved_workflow {id, version} threaded from the create-task boundary. - models.py: delete TaskType enum; TaskConfig gains resolved_workflow, policy_principal, is_pr_workflow (drops task_type). - config.py: build_config takes resolved_workflow (not task_type); derives policy_principal via workflow.policy_principal_for and is_pr_workflow; PR_TASK_TYPES → PR_WORKFLOW_IDS. - Cedar UNCHANGED (Phase 2a owns the principal migration): the Agent::TaskAgent::"<id>" scheme and contracts/cedar-parity/ are untouched. policy_principal_for maps read_only⇒pr_review, else id→legacy {new_task, pr_iteration, pr_review}. Only policy.py edit: drop the WARN-only TaskType() validation. runner.py passes config.policy_principal as the principal. All test_policy* pass unchanged. - prompts rekeyed by workflow id (get_system_prompt falls back to the default coding prompt for an unknown id rather than raising). - pipeline.py: _execute_agent_step loads from resolved_workflow, always-on; the 3 task_type=='pr_review' branches → workflow.read_only; ensure_pr takes an explicit strategy (create/push_resolve/resolve) from the workflow step, resolving the deferred code-review item. - post_hooks.ensure_pr: strategy param replaces task_type self-inspection. repo.py: PR-branch resume keys off config.is_pr_workflow. - server.py/entrypoint.py: thread resolved_workflow; validation keyed on PR workflow ids. - New faithful Phase-1 workflow files coding/pr-iteration-v1 (push_resolve) + coding/pr-review-v1 (read_only, resolve, no Write/Edit per rule 4). All 4 first-party files validate; CDK descriptor drift-guard green. - docs: API_CONTRACT.md migration table + resolved_workflow responses; Starlight mirror synced. Agent suite 922 passed, ruff + ty clean. With tasks 6-7 (e829c4f) this completes the breaking cutover across API + CLI + agent. * fix(agent): use ty ignore syntax + annotate workflow test handlers (aws-samples#248) The full build was never completed on this branch; ty type-checking and ruff format gates were red. No src logic changes — type/format hygiene only. - Tests used mypy-style `# type: ignore[arg-type]`, but the project type checker is ty (`# ty: ignore[rule]`), so the suppression silently no-op'd. - test_workflow_runner.py: annotate inline step handlers as StepHandler-compatible `(step: Step, ctx: StepContext) -> StepOutcome`; type the `handlers` dicts as `dict[str, StepHandler]` (ty treats the alias params as positional-only → dict-value invariance rejects named-param funcs); convert the stale ignore to `# ty: ignore[invalid-argument-type]`. - test_entrypoint.py: assert `resolved_workflow is not None` before subscripting (TaskConfig.resolved_workflow is `dict | None`). - Apply ruff format reflows (runner.py, validator.py, test_workflow_tree_valid.py) the branch was committed without. * feat(agent): Cedar read-only enforcement keys off context.read_only (aws-samples#248 Phase 2a) Migrate read-only enforcement from the literal `Agent::TaskAgent::"pr_review"` principal match to the `context.read_only` attribute, so the Write/Edit hard-deny attaches to the *property* and fires for every read-only workflow uniformly — not just coding/pr-review. Removes the Phase-2b principal bridge. This is the security-load-bearing step: an error here *silently weakens* enforcement (the rule stops matching) rather than failing loudly. Guarded by new contracts/cedar-parity/ golden fixtures verified on BOTH the cedarpy (Python) and cedar-wasm (TypeScript) engines. Policy (both bindings, kept byte-for-byte identical — drift guard enforces): - agent/policies/hard_deny.cedar + cdk/.../builtin-policies.ts: pr_review_forbid_write/edit (principal == "pr_review") → read_only_forbid_write/edit (when context.read_only == true, any principal). Wiring: - policy.py: PolicyEngine gains read_only kwarg; threads read_only into every Cedar request context (probe + base_context). - models.py: TaskConfig.read_only; config.py derives it from the resolved workflow; runner.py passes config.read_only to PolicyEngine. - workflow/loader.py: remove the read_only ⇒ "pr_review" bridge in policy_principal_for — the principal is now an identity/audit tag only; pr-review keeps its own id-derived principal. Parity fixtures (new): read-only-forbid-write, read-only-forbid-edit (read_only=true ⇒ deny), read-only-false-permits-write (read_only=false ⇒ base_permit — proves the deny is gated on the property, not always-on). Tests: TestPrReviewPermissions → TestReadOnlyPermissions, now asserting the deny fires for any read-only principal AND that read_only=false permits Write; test_hooks + test_config updated. agent 927 / cdk 1822 green; ty + ruff clean. Docs: ADR-014 addendum (2026-06-08) records dropping the isolated-PR/ordering requirement (2b shipped first behind the bridge, so read-only was never unprotected); WORKFLOWS.md §"Replacing the Cedar principal" + phasing table updated to past tense. * docs(workflows): resolve repo-optional open questions + freeze schema (aws-samples#248) ADR-014 addendum (2026-06-08) settles the two open questions that gated the Phase-0 schema freeze (WORKFLOWS.md open questions #1, #2). With the one implied schema reshape applied, the schema is frozen — later phases add handlers and plumbing, not schema fields. Decision 1 — Memory actorId for repo-less tasks: per-user user:{cognito_sub}. Caller-scoped (no cross-tenant bleed; mirrors the per-user trace prefix); cross- workflow pooling explicitly not adopted. NO schema field — a fixed platform fallback, applied in memory.py in Phase 3 when repo is absent. Decision 2 — Artifact delivery: named Python deliverers (open target string), shared S3 plumbing pinned now. deliver_artifact.target resolves to a registered deliverer (workflow/deliverers.py → DELIVERERS), same pattern as STEP_HANDLERS — a new delivery method is a registered deliverer, not a schema change. Plumbing frozen: task-scoped key artifacts/{task_id}/, prefix-scoped SessionRole grant, size limit, TaskDetail URL surfacing, workflow:{id} repo-less tenant tag. Schema reshape applied: - workflow.schema.json: steps[].target drops its enum (stays type:string, minLength:1); the closed set moves into the deliverer registry. - models.py: DeliverTarget Literal → str. - New workflow/deliverers.py: Deliverer dataclass + DELIVERERS registry + produced_outcomes(); single source of truth for "what each target produces." - validator.py rule 11: consults the registry (_deliverer_produces) instead of the hardcoded _DELIVER_TARGET_OUTCOMES map. The three first-party names keep their exact produced-outcome sets, so NO existing workflow / fixture / golden vector changes verdict; an unknown deliverer name now produces nothing (new guard, tested). Tests: test_workflow_deliverers.py (registry contract) + a rule-11 unknown-deliverer case. agent 932 green; ty + ruff clean. deliver_artifact remains a NotImplementedError stub until Phase 3 — only the contract is frozen. * feat(api,agent): repo-optional admission for repo-less workflows (aws-samples#248 Phase 3) First slice of Phase 3 (repo-optional tasks): the admission path now accepts a submission with no repo when the resolved workflow does not require one, so a repo-less knowledge workflow (e.g. default/agent-v1) can be admitted, hydrated, and run from task_description alone. The deliverer/S3 + agent-runtime clone-skip slices follow separately. Semantics: requires_repo decides whether a repo is MANDATORY; requires_repo:false means repo-OPTIONAL (a repo-less workflow may still target a repo). The repo-less behavior keys off repo *absence*, not the workflow flag — a repo-optional workflow given a repo still takes the repo-bound path. Types (CDK + CLI mirror, kept in sync — check:types-sync green): - CreateTaskRequest.repo, TaskRecord.repo → optional; TaskDetail.repo, TaskSummary.repo → string | null. Mappers null-coalesce. Admission (create-task-core.ts): resolve the workflow FIRST, then gate repo on workflow.requiresRepo (missing repo on a repo-bound workflow → 400). Onboarding + blueprint-cap lookup runs only when a repo is present; repo-less takes the platform default cap. Record omits repo when absent; replay-required-fields drops 'repo'; event metadata null-coalesces. Hydration (context-hydration.ts): new repo-less branch (keyed on !task.repo) that assembles from task_description + per-user memory, returning before the repo-bound fetch path; assembleUserPrompt drops the Repository: line when repo is absent. Memory (memory.ts + orchestrator.ts): loadMemoryContext/writeMinimalEpisode take an actorNamespace (repo for coding, user:{user_id} for repo-less — ADR-014 addendum). Preflight already honored requiresRepo (Phase-1 seam) — widened its repo param to string|undefined. fanout-task-events skips the GitHub channel for a repo-less task. loadBlueprintConfig skips the RepoTable lookup when repo is absent. Agent (config.py): build_config loads the workflow up-front and requires repo_url/github_token only when requires_repo; repo-less runs from task_description. CLI shows '—' for a repo-less task. Tests: repo-less acceptance (create-task-core, create-task, context-hydration, CLI format) + repo-bound-missing-repo rejection; memory test log key repo → actor_namespace. agent 932 / cdk 1825 / cli 296 green; ty + ruff + eslint clean. deliver_artifact is still a NotImplementedError stub — a repo-less task can be admitted and hydrated but not yet deliver an artifact (next slice). * feat(agent): repo-less pipeline path via the workflow runner (aws-samples#248 Phase 3) Second Phase-3 slice: a repo-less workflow now runs the agent loop in-container with no clone / build / PR. When config.requires_repo is false, run_task delegates to _run_repoless_task, which drives the workflow's steps (hydrate_context → run_agent) through the workflow step runner — coding vs knowledge now differ by the workflow's steps list, per ADR-014's end-state — and assembles a terminal TaskResult with pr_url=None. Scope boundary (deliberate): only hydrate_context + run_agent are driven here. deliver_artifact is the workflow's terminal step but its handler is a stub until the artifacts-bucket S3/IAM contract ships (next slice), so it is excluded via only_kinds; delivery is deferred, the agent run is not. The coding path is left byte-identical — its hard-won inline cancel-safety (skip ensure_pr on a cancelled task) is NOT yet runner-driven, so routing the full coding step list through the runner is a separate cancel-aware change, intentionally not done here. - models.py: TaskConfig.requires_repo (defaults True; coding unaffected). config.py build_config sets it from the resolved workflow. - pipeline.py: requires_repo branch before the repo-coupled segment; _run_repoless_task runs the runner + assembles/persists the terminal result. run_task repo_url now defaults to "" (repo-less callers omit it). - prompts/default_agent.py: repo-less system prompt (no git/branch/PR placeholders); registered for default/agent-v1. - prompt_builder.build_repoless_system_prompt + shared _render_memory_context. Tests: repo-less pipeline run (no setup_repo, no PR, success, repo-less prompt); repo-less prompt has no repo placeholders. agent 934 green; ty + ruff clean. * fix(agent): address PR-review findings on Phase 2a/3 (aws-samples#248) Findings from the plugin review (code-reviewer, silent-failure-hunter, pr-test-analyzer, type-design-analyzer) on 4c34c8b..HEAD. HIGH — repo-less task reported COMPLETED while delivering nothing (silent-failure-hunter). _run_repoless_task skips deliver_artifact (stub until the artifacts contract ships), but the workflow's terminal outcome (e.g. `comment`) was never produced, yet status was success. Now: if the primary terminal outcome is delivery-backed and delivery was skipped, the task is a loud FAILED with an explicit error + delivery_deferred milestone naming the gap. MEDIUM — config.py load-failure fallback bug + fail-open (code-reviewer, silent-failure-hunter). The WorkflowValidationError branch compared workflow_id against DEFAULT_WORKFLOW_ID (the *coding* default) for requires_repo, contradicting its comment, and defaulted read_only=False (fail-open) for unknown ids. Now: log the failure (was silent); fail CLOSED — read_only=True for any id not in _KNOWN_WRITEABLE_WORKFLOW_IDS; requires_repo keyed off the real repo-less default (REPO_LESS_DEFAULT_WORKFLOW_ID = default/agent-v1). MEDIUM — TaskConfig missing cross-field invariant (type-design-analyzer). Added _validate_requires_repo_has_repo (mirrors _validate_trace_requires_user_id): requires_repo=True ⇒ non-empty repo_url, enforced at construction. repo_url and github_token now default to "" so a repo-less TaskConfig is constructible; the validator preserves the repo-bound invariant. LOW/secondary — rule-8 now also checks deliver_artifact.target resolves in DELIVERERS (type-design-analyzer), so an unknown target is caught universally, not only when it collides with the primary outcome. Test gaps closed (pr-test-analyzer): repo-OPTIONAL workflow given a valid repo (admission + hydration); repo-less memory keyed user:{user_id} (asserted); malformed-repo-on-repo-bound rejection; preflight repo-less short-circuit + repo-bound-no-repo invariant; repo-less agent_result=None → FAILED; config load-failure fail-closed; TaskConfig validator; rule-8 deliver target. agent 941 / cdk 1831 / cli 296 green; ty + ruff + type-sync clean. * feat(agent,api): deliver_artifact + repo-less memory — close criterion aws-samples#4 (aws-samples#248 Phase 3) Final Phase-3 slice: a repo-less knowledge task now runs end-to-end — admitted, hydrated, agent loop, AND delivers its output. Closes acceptance criterion aws-samples#4 (a non-coding task runs without a repo) for the s3/comment deliverers. Infra (IAM + env): - agent-session-role.ts: s3:PutObject grant scoped to artifacts/${aws:PrincipalTag/task_id}/* on the trace-artifacts bucket (mirrors the traces/ pattern; task_id-scoped per ADR-014 addendum). cdk-nag reason updated. - agent.ts: ARTIFACTS_BUCKET_NAME runtime env (same bucket as traces). Deliverers (agent/src/workflow/deliverers.py): - DELIVERERS gains a deliver(target, ctx) dispatcher + DeliveryResult. The s3 deliverer uploads the agent's result text to artifacts/{task_id}/result.md via the tenant-scoped S3 client (5 MiB cap); the comment deliverer records a delivered_comment milestone for the channel fanout; s3_and_comment does both. - AgentResult.result_text captures ResultMessage.result (the knowledge-task deliverable); runner.py populates it on success. - runner._handle_deliver_artifact: now implemented (was NotImplementedError), routes through deliver(); a delivery failure is a failed step (terminal FAILED), never a silent skip. Pipeline (pipeline.py): - _run_repoless_task drives the FULL step list (deliver_artifact included); replaces the delivery-deferred FAILED gate with the real runner verdict (agent succeeded but workflow failed ⇒ attributed FAILED). Sets TaskResult.artifact_uri. - Repo-less episodic memory write keyed user:{user_id} (fail-open). Memory (memory.py): write_task_episode repo→actor param; _validate_actor accepts owner/repo OR user:{id} so the same write path serves coding + knowledge tasks. Types: TaskResult/TaskRecord/TaskDetail + CLI mirror gain artifact_uri (mirrors trace_s3_uri); CLI detail view shows an Artifact: line. check:types-sync green. Tests: deliverer dispatch (s3 upload + key/body, comment milestone, s3_and_comment, missing bucket, empty text, oversize); repo-less pipeline now succeeds + sets artifact_uri=null for comment-only; session-role artifacts grant; deliver_artifact dropped from the fail-loud stub list. agent 947 / cdk 1832 / cli 296 green; synth + cdk-nag clean; ty + ruff + type-sync clean. * feat(agent): make repo-less delivery retrievable + ship a knowledge workflow (aws-samples#248 Phase 3) Three should-fix items closing the gap where "criterion aws-samples#4" was met by machinery but the default repo-less path delivered to a channel that isn't wired. 1. Stale docstring: _run_repoless_task said deliver_artifact was a deferred stub — it ships now. Updated to describe the real full-step-list run. 2. Default workflow delivers retrievably. The channel fan-out only knows slack/email/github; an api-origin repo-less task (the common default case) has no channel, so a `comment`-only deliverer surfaced nothing the user could get. default/agent-v1 now uses `target: s3_and_comment` (primary outcome `artifact`): the S3 upload to artifacts/{task_id}/ is always retrievable, the comment milestone is still rendered by the fan-out when a channel exists. 3. Ship a runnable knowledge workflow: agent/workflows/knowledge/web-research-v1.yaml — the concrete repo-less demonstration (research → S3 artifact, no repo). Built-in capabilities only (registry MCP/skills are Phase 4); CDK descriptor mirror + drift-guard + resolver tests updated. Also fixes a latent bug the knowledge workflow surfaced: a repo-less id with no registered prompt fell back to the CODING prompt (leaking unsubstitutable {repo_url}/{branch_name}). get_system_prompt gains repo_less= so repo-less workflows fall back to the repo-less default-agent prompt instead. Tests: repo-less pipeline now asserts artifact_uri set (s3_and_comment); repo-less prompt fallback; knowledge workflow validates/resolves. agent 949 / cdk 1833 green; ty + ruff + type-sync clean; docs synced. * fix(agent): address full-branch review — 2 repo-less blockers + cleanups (aws-samples#248) Findings from the four-reviewer full-branch pass. Two were blockers that would have shipped the repo-less feature broken despite green unit tests (the tests didn't cross the server/task_state boundaries). BLOCKER 1 — repo-less task rejected on the AgentCore backend. server.py _validate_required_params required repo_url unconditionally, so /invocations returned 400 for every repo-less task before the pipeline ran (ECS backend dodged it by calling run_task directly). Now gates repo_url on the workflow's requires_repo (resolved via load_workflow; fails SAFE — assume required — on a load error). Test: repo-less payload accepted, repo-bound still requires repo. BLOCKER 2 — artifact_uri computed but never persisted. task_state.write_terminal's field allowlist had no artifact_uri branch, so the URI was dropped before DynamoDB and TaskDetail.artifact_uri was always null (delivery not retrievable despite the S3 object existing). Added the persist branch + tests. Secondary: - comment deliverer no longer overstates: _post_comment docstring + WORKFLOWS.md state plainly that delivered_comment is recorded for the event stream (visible in `bgagent watch`) but is NOT yet routed to an external channel (not in the fan-out ROUTABLE_MILESTONES). comment_posted=True only when actually recorded. - test pins the config→engine read_only seam (runner.py PolicyEngine(read_only=)): dropping it now fails a test instead of silently disabling the Write/Edit deny. - stale phase-boundary comments corrected (post_review docstring, runner module docstring + only_kinds + run_agent guard, pr-review/pr-iteration YAML headers) — they described Phase 2a/2b/3 as pending when those shipped on this branch. - ADR-014: replaced the dangling pre-rebase commit hash 838c72e (unreachable after rebase) with a descriptive reference. agent 954 / cdk 1833 green; ty + ruff + type-sync clean; docs synced. * test(cli): cover artifact_uri render lines (aws-samples#248 Phase 3) The Artifact: display lines added to formatTaskDetail / formatStatusSnapshot introduced uncovered truthy branches that dropped CLI branch coverage below the 84% gate (full `mise run build` caught it; the per-package run had passed on the prior content). Adds non-null + null cases for both renderers, mirroring the existing trace_s3_uri tests. cli 300 passed; full build green. * fix(agent): address PR review findings + green CI (aws-samples#248) CI: ruff-format the 4 files that tripped the build self-mutation (config.py, deliverers.py, test_pipeline.py, test_task_state.py) so "Fail build on mutation" passes. Review findings: - prompts/__init__.py: get_system_prompt now logs WARN when an id has no registered prompt and falls back, restoring the signal the old build_system_prompt emitted (silent-failure-hunter). - pipeline._run_repoless_task: chain wf_result.failed_step.error into the synthesized AgentResult so a run_agent failure on the repo-less path is diagnosable instead of a generic message (code-reviewer). - server._validate_required_params: narrow the broad `except Exception` around load_workflow to WorkflowValidationError/ImportError so a genuine programming error surfaces loudly instead of being masked as "could not resolve requires_repo" (still fails safe — repo required). - workflows.ts workflowIsReadOnly: document why the unknown-id default is `false` (conservative admission: needsWrite=!readOnly) and must NOT be "aligned" to the agent-side fail-closed read-only default. - runner.py / pr-review-v1.yaml: comment fixes — artifact_key→artifact_uri, drop never-written review_posted, rule 4→6, post_review handler is registered-but-unimplemented not "no shipped handler" (comment-analyzer). Tests (pr-test-analyzer gaps): - test_memory.py: cover _validate_actor (user:{id} namespace + rejects) and a write_task_episode(actor="user:...") reaching create_event. - test_pipeline.py: cover the repo-less delivery-gate-failure branch (agent succeeds but deliver_artifact fails → terminal FAILED naming the step). Docs: ROADMAP.md "Task types" now reflects workflow-driven tasks + repo-less knowledge workflows; Starlight mirror regenerated. agent 962 passed, ruff+ty clean, cdk compile+workflows tests green. * fix(agent): ship first-party workflow files in the image + sync user docs (aws-samples#248) The Dockerfile copied agent/src, agent/policies, and contracts but never agent/workflows, so the runtime loader (which resolves /app/workflows/<domain>/ <name>-vN.yaml) failed every workflow load with WorkflowValidationError. Add COPY agent/workflows/ /app/workflows/ so the five shipped workflows + the JSON schema land in the image. Verified in a local build: load_workflow( 'coding/new-task-v1') resolves to 1.0.0. Also bring the user-facing guides in line with the workflow-driven model: USER_GUIDE/QUICK_START now document workflow_ref, the --workflow flag, and resolved_workflow responses (the retired task_type is shown only as a migration mapping). Rename the "Task types" section to "Workflows", update the Starlight sync map + sidebar slug, and regenerate the mirrors. * fix(agent,api,cli): address PR-review findings on the workflow seam (aws-samples#248) Resolve five verified findings from the aws-samples#248 review: HIGH — thread agent_config.allowed_tools to the SDK. runner.py hardcoded the full tool surface, so a workflow's declared allowed_tools never reached ClaudeAgentOptions and the read_only:false read-leaning lanes (default/agent-v1, web-research-v1) could still get Write/Bash. Add TaskConfig.allowed_tools (populated from the resolved workflow), resolve it in run_agent via _resolve_allowed_tools (drops Write/Edit when read_only), and add SDK-layer tests. Cedar missing-read_only fail-open — add the ADR-014-promised parity vector. With context omitting read_only, both engines fail OPEN (Allow + base_permit); safety rests on policy.py::_eval_tier re-raising on diagnostics.errors. Add contracts/cedar-parity/read-only-missing-attribute.json (verified on cedarpy AND cedar-wasm) plus test_policy.py tests pinning the re-raise + always-inject discipline. MEDIUM — repo-less artifact success gate. pipeline.py now fails the task when the primary outcome is `artifact` but no artifact_uri was produced (matches the WORKFLOWS.md "agent-success AND S3 key present" contract), not just when the deliver step raised. MEDIUM — rule 13 model allow-list admission. Add WORKFLOW_MODEL_ALLOWLIST + disallowedWorkflowModel() and enforce it in create-task-core (unpermitted model => 400, no silent downgrade); descriptor sync-test asserts modelId matches the YAML and stays on the allow-list. LOW — CLI repo-less submission. --repo is no longer a hard requiredOption: required unless --workflow selects a repo-less workflow (still required with --pr/--review-pr). Lets bgagent reach knowledge/web-research-v1. Rule 10 (one production version per lineage) remains intentionally unenforced at the single-file layer — it is a registry/promotion-time property. * fix(agent,api,cli): address PR-review aws-samples#296 resolution-ladder + deliverer findings (aws-samples#248) Resolve the remaining findings from isadeks's review (findings #2/aws-samples#4 were already fixed in the prior commit): #1 — bare submit silently stopped opening PRs. The server ladder resolves an absent workflow_ref to the repo-less default/agent-v1 (by design), so a CLI `submit --repo X --task Y` regressed from "opens a PR" to "S3 artifact". The CLI now sends coding/new-task-v1 explicitly when a repo is present and no workflow/ pr flag is given, preserving the old new_task default. aws-samples#3 — repo-optional + repo disagreed across the boundary. The agent took the repo-less path off config.requires_repo alone, while the orchestrator built a repo-bound prompt off repo presence. pipeline.run_task now takes the repo-less path only when requires_repo is false AND no repo was supplied; a repo-optional workflow given a repo runs the repo-bound path (clone/build/PR), matching CDK admission and the prompt. aws-samples#6 — resolveWorkflowRef silently discarded a @Version constraint. It now honors the constraint (Phase 1: must match the single shipped version) and returns null otherwise; create-task-core 400s with a distinct "version not available" message via resolveWorkflowRefError, instead of quietly running the shipped version. aws-samples#7 — deliver_artifact unset-target default disagreed between validator (lenient full set) and runtime (s3). Introduce DEFAULT_DELIVER_TARGET as the single source of truth; produced_outcomes(None) now models that exact default, so a primary:comment workflow that omits target is correctly flagged by rule 11. aws-samples#10 — descriptor/YAML drift had no parity test. The sync test now asserts required_inputs (allOf/oneOf) parity in addition to id/version/requires_repo/ read_only/model, and a new test cross-checks the agent-side _KNOWN_WRITEABLE_WORKFLOW_IDS (the third hand-maintained copy) against the writeable repo-bound descriptors so drift fails CI. * fix(agent): close PR-review aws-samples#296 follow-ups — post-hook fail-soft, prompt, cap, parity (aws-samples#248) aws-samples#5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload failure AFTER run_agent mutated/committed the tree stranded the work as FAILED with no PR. read_only now comes from config (already fallback-computed and the verdict that drove Cedar); the reload — needed only for the ensure_pr strategy — is wrapped in the same WorkflowValidationError fallback build_config uses, defaulting to "create" so the PR still opens. aws-samples#8 — knowledge/web-research-v1 had no registered prompt and silently degraded to the generic default-agent prompt. Add a research-specialized, repo-less prompt (prompts/web_research.py) and register it. aws-samples#9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap actually bounds memory on the MicroVM before materializing the bytes. aws-samples#10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID / DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform default and be requires_repo:false in the YAML; the coding default must be repo-bound. Closes the last hand-maintained-copy axis aws-samples#10 named. Perf — memoize load_workflow (@cache): files are image-baked and immutable per process and Workflow is frozen, so the 3-4 loads per task now parse once. Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a larger half-migrated-runner refactor both reviewers flagged as non-blocking. --------- Co-authored-by: bgagent <bgagent@noreply.github.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new roadmap iteration (3e) and updates design documents to address memory poisoning — classified as OWASP ASI06 (Memory & Context Poisoning) in the 2026 Top 10 for Agentic Applications. Based on a deep research report covering 33 sources (academic papers, CVE disclosures, OWASP standards, and industry analyses).
Changes across 3 files (docs only, no code changes):
Key findings from research
loadMemoryContext()injects retrieved memory without any validation9 identified gaps
loadMemoryContext()4-phase implementation roadmap
Test plan